CyLab: Advanced Analytics Environments to Protect Critical Infrastructure
(Upbeat Music) [Chase Garwood] Welcome everybody. Thank you for joining our discussion on advanced analytics environments to protect critical infrastructure. My name is Chase Garwood. I'm an RD&I portfolio manager for CISA, the Cybersecurity and Infrastructure Security Agency in the Department of Homeland Security, Science and Technology Director at S&T. And I'll be our moderator for our conversation today. Our talk today is as part of series three of the DHS Whole-of-Government R&D Showcase, mitigating evolving threats, and understanding the convergence of emerging technologies.
Before we get started, just a few thoughts to provide context for our discussion today, there've been a lot of high profile cyber security attacks in the press and in the public domain, but for DHS and CISA, that's our daily mission. To meet that daily challenge now and into the future, our discussion today will focus on how DHS is thinking differently to mitigate risk and enhance resilience for our nation's critical infrastructure in our federal systems and data. To meet a world of always evolving threats, S&T and CISA are partnering to build new capabilities known as CyLab, a multi-cloud environment where new tools and software can be researched, tested, and operationalized to counter existing and emerging threats. In CyLab, CISA and S&T are already underway to research and test applied machine learning capabilities, next generational cyber threat hunting and host of other tools needed to ensure CISA and our public and private sectors, have the capability and new capacity to defend not only our federal systems and networks, but also our nation's critical infrastructure. So let's get going.
Let me introduce our esteemed panel. I begin with Alexandra or Alex Phounsavath, Director of Data Analytics Technology Center, here at S&T. Dr. Gary Jones, the Associate Chief Strategic Technology office at CISA, and Preston Werntz, the Assistant Chief Data Officer, also in the CTO's office in CISA. So let's start our brief discussion for this morning.
Question number one, focus in on the overall vision, purpose and need. I like to kick off with understanding, exactly what is CyLab? Gary, let's start with you. What is CISA strategically working to achieve in partnering with S&T on CyLab as a major research development and innovation effort? [Garfield "Gary" Jones] Thanks Chase. Great question there.
You know, CyLab is basically that environment that really promotes collaboration between CISA, its partner organizations, public and private organizations in that partnership. It's going to be a physical any a virtual space that really supports improving CISA analytics. Improve CISA architecture by leveraging different cloud vendors. Test vendor solutions within that space, and it's gonna enable realistic tests and seamless transition of solutions to production.
So it's also going to provide that warehouse, the logical data warehouse with CISA authorities, and then it's going to enable those lower technology readiness level analytics, and smaller research areas to really get ready to deploy into the production space. A little bit about the users, on the CISA side and the DHS side are really going to be the DHS staff and the contractors. But, as we build out, CyLab and it gets fully realized, I think, we'll start to have more downstream impact with other partners in other organizations. [Chase Garwood] Great, thanks Gary, for that initial, just kind of strategic vision and where we're going with all this.
So Preston as the Chief Data Officer, following up on Gary's opening remarks, any other strategic and visionary perspectives you want to share and add? [Preston Werntz] Very quickly, I just kind of echo what Gary said obviously, and you know, so much of what we're trying to do with CISA is take, a lot of ways, the data we've got and get that out to our stakeholders and partners and customers in forms that make sense to them, either maybe a machine readable, cyber threat indicator, or more intensive, long form risk analysis, kind of piece for critical infrastructure owners and operators, in order for us to be very nimble and fast and do that, we need an environment like CyLab, where we can take data we've got, we can take new data sets. We can kind of look at new algorithms, look at new tools, turn that around in a lab environment. So we can kind of prove out the value of it, work with our legal staff, work with our privacy partners and transition these capabilities.
These new datasets, these new algorithms, these new tools, get them into our production environments and kind of, get that stuff back out faster and in forms that help our customers, do what they do better. [Chase Garwood] Great, thanks Preston. That really adds some different contexts to it.
So Alex, we just heard the CISA vision and strategic intent for this research involvement and into the operational environments. Could you give us an overview of the S&T CyLab research plan that's going to actually do this? [Alex Phounsavath] Absolutely. So obviously I get the fun part of delivering. So our research plan has three areas and the first is really about the ecosystem. It's going to be a multicloud environment where we can look at different capabilities provided by different cloud providers.
We're also going to get to understand through this effort inner cloud computation and really understand cost models. And, do we need to forklift data from one cloud to another, or, can a computation to be done across clouds? And it's going to be important for the future when, as you heard Gary talk about the different end users, in the future, I may be in a Google cloud, you may be in an Azure cloud, but we'll still need to be able to collaborate. So also in this environment, we're looking at high performance computing capabilities, it's relevant immediately for training AI algorithms. We're also looking at privacy protecting enhancing information sharing tools, so tools that allow you to share information with other parties, but yet keep certain parts of your data private.
So that's the environment piece of it. Moving over to the second part of the research plan. It's really about the tools that go into this environment. So the AI/ML pipeline that we're building out, the auto wrangling tool, the railing tools that help people prepare data, the tools that catalog data, the model building tools, there are tools out there now that actually, instead of spending months building models, these tools are actually gonna spit out many, many models in a very short time, days and weeks. And of course, we're looking at NLP tools, natural language processing tools in this environment as well. So that second piece is really about tools.
The third area is, it's a stretch goal. It's the least technical area, but it's actually the hardest. We are actually looking at how to bring expertise in from academia to support CISA missions in a collaborative space and building this problem solving community and integrating them into the whole the CyLab vision. So, where is this space? What data sets go in there, what do you do with folks you who may not be fully cleared and you can imagine the benefit of that, right? There is perhaps another Colonial Pipeline incident, and everybody kind of gets together.
And there's a lot of activity around that, but you know, one of the questions that we're asking is, how do you sustain that? [Chase Garwood] Great, thanks Alex. I know we just did the snippet there, but we all know that research plan and those three parts are a lot of work going on right now and a lot of work to come. So Gary, speaking to that, we know this doesn't happen overnight. These are very hard problems.
This is not just regular data, we're talking machine learning and very large data sets and complex problems that CISA and others are trying to solve. And we are still on the RD&I phase, but you can give us a brief timeline and horizon of what's coming over the life cycle of this effort and into CISA operations. [Gary Jones] Sure thing, Chase. So the great news is that we're building it now, right.
We're working on it. It's being built, we're looking at 2024 for initial operating capability, but as with any capability or environment, as you know, Alex has really mentioned, there's a lot, especially with NLP and some of the other analytical tools that she's mentioned, there's a maintenance factor with it. So it has to be maintained and sustained to keep it going. And so, I mean, these things are not, it's not going to be something that's going to be done overnight for sure, but especially not only implemented overnight, but it's going to take a while to really build out these capabilities. I would say anywhere between 2025 to 2026, we're looking at probably more capabilities added onto it.
So the initial operating capability is right around 2024, but there will be additional capabilities put onto them. [Chase Garwood] Great, thanks. And just one following question to that, initial operating capability we know, is a little further out, in-between that as we are developing and spinning things up, do you see capability being explored and utilized for mission use cases, even in rough beta versions and what not along the way, it's not just a kind of a big bang.
[Gary Jones] Yeah, absolutely. You know, this is an incremental development. We are working towards a minimum viable product where, we're going to have kind of the basics of machine learning, for example, an IE based type machine learning capability in there just to get things started and see how it works within that environment. So there will be an incremental devOps type development, so to speak.
But we plan to have something, you know, 2021, and on until we get to the initial operating capability. [Chase Garwood] Fantastic, thanks. It always seems much further out and then it's right there with us. So moving on to my next question. So Preston, we're talking a lot about the machine learning and the automated tools and whatnot, you know, moving from human-centric, analysts doing a lot of work to more automated tools for data analytics, but the key is still always data.
How is CISA evolving this data ecosystem now and in the future to improve its mission capabilities and take advantage of the new capabilities coming online? [Preston Werntz] Yep totally, it's a great point, Chase. And certainly I'm going to steal this from someone else, I've heard people talk about these AI, and these models, these ML tools being like your fancy kitchen appliances, right. Everyone wants to go buy the fanciest toaster and stove with all the buttons and settings, but really it's the ingredients you put in your food that you go bake in there.
And that's what our data is. It's those basic ingredients, the basic building blocks and within CISA then we're really trying to make sure that data we've got is going to be in the best shape possible that we can move it into a CyLab and use it for these more advanced purposes, for some doing new things, being much more flexible and adaptable to the changing situations. Part of that, first part upfront is always kind of governance and stewardship, really understanding what data do we have, where does it come from? What can we do with it? You know, CISA has a tremendous amount of data. We have a lot of traditional, what people consider big data, a lot of cyber data, a lot of very heavily structured data.
We also have what I've heard called wide data, smaller data sets, maybe not as structured, but we want to bring those things together. And CyLab is the place we're looking to do that. But, in order for me to use that data properly, I've got once again, understand, is it something that CISA generated ourselves? Did we buy it commercially? Was it shared to us from one of our partners? All those different datas, even at the unclassified level, have certain sensitivities, maybe privacy sensitive, maybe critical infrastructure sensitive. So that governance and stewardship is so important, to kind of have a handle on that. And once we've got our, you know, as we improve our governance and stewardship, this is where we want to start to layer on different ways to understand and use that data.
So this comes with the work we're doing with our enterprise conceptual data model to map all our data, to these different concepts and classes, increasing the amount of metadata we capture, because obviously, the more metadata we have, the more we can figure out which data sets are appropriate to use in which model. And there's a lot of these models will get complex and there are certain things, just naturally models like a drift in a model. Models will always change over time. It's kind of real-world data changes. We want to make sure we're feeding the best data we can into these models, to help minimize that.
And keeping on top of changes to our data. If a database schema changes, if something changes from someone we're purchasing data from, we need to know that upfront because that's going to impact some of the model parameters and the different feature sets we use. So those are the two things you know, we're really focused on, both today to get the data we've got today in CISA, ready to be used in CyLab and then putting kind of the policy in place. So that going this day going forward, we know every piece of data we capture today will end up in CyLab, will end up in some machine learning models, will end up in new things that we are sharing out to our stakeholders.
Either in machine readable content for other cyber tools in industry, or other content that we're sharing out to executives or other critical infrastructure owners and operators. [Chase Garwood] Great, fantastic. Thanks Preston.
So Alex, as a data scientist, and you know what Preston was just going over about the breadth and depth of CISA data, CyLab is initially focused on cybersecurity data, but the potential for other areas, could you touch upon kind of what CyLab has the potential to do for CISA? [Alex Phounsavath] Absolutely, so hats off to Preston for all the work that he's doing with data governance and improving the quality of data, it fits very nicely into CyLab because as you said, we're starting out, focusing our first group of end-users are going to be the cybersecurity data scientists, right? And we mentioned some of the tools that may be relevant, including natural language processing tools, but with all the data sets and variety of data sets like image, video, live streaming data that CISA is collecting, all that kind of fits in, because this environment, the CyLab environment is extensible to those other use cases. So that means that we can bring in, besides NLP, we can bring in computer vision capabilities, HPC may be applied to other scenarios, maybe edge computing. So I think it just means that, the CyLab environment that we're building today is extensible to support many missions across this, beyond cyber security. Over to you, Chase. [Chase Garwood] Great, thanks Alex.
And one last question back to Preston, and you mentioned a little bit, but also for the audience who may not be as aware of system mission, with that breadth and depth of data that you're evolving and modernizing and the capabilities out of CyLab, how will that help improve or enhance the products that in the services that CISA provides, not only cybersecurity, but your other mission and across the breadth and depth of the mission space? [Preston Werntz] Yeah, it's really interesting. So within CISA, obviously, if we look at ourselves as kind of the nation's risk advisor, so much of what we do is risk, long-term systemic risk to critical infrastructure sectors and our national critical functions that we talk about. But, we have a lot of tactical cyber data, a lot of log data, a lot of indicators, we can use that data and we can look at trends and summaries and aggregates to see how that might impact a particular sector over time. So really so much of what we do is thinking about, like the cyber data we've got, some of the infrastructure data we've got, how do I take that to both meet the needs of the stakeholders in that vertical? I want to get more cyber data out to network defenders to manage security service providers.
At the same time, I want to take that data and find the right way to use it, to improve what we do from a long-term risk analysis standpoint to help folks really think about mitigating long-term risk. There's, let's put out the fire today, but long-term, what should we be doing to be planned? And this is where, once again, I think the CyLab environment, as we look at some of these new tools, how do we bring this data together? How do we start building models that are not only solving, some of the near term cyber prompts a day, but can also be thinking about some of the risk models that we're going to help folks, better prepare themselves to be protected, six months, a year or two years down the road. [Chase Garwood] Great, fantastic. I'm only grinning a little bit because this is really exciting to be part of, I'm happy that we're partnered with CISA on this really important area.
All right. So we're running out of time. We only have a few minutes left in our session. So I have one final question, so we'll keep it brief. So when thinking of the value of RD&I, research and development and innovation and addressing cyber security threats and the other threats and risks in critical infrastructure and why this work really matters, what's your key takeaway for our audience? What's the one thing they should take home about our conversation today? So I'll bring it back up to Gary.
Let's go with you first. [Gary Jones] Thanks Chase. So I really want to leave the audience with, CISA we're working on the right things, right. Going after the next great buzzword, is not something we want to be doing.
We're addressing the evolving and complex threat that we're dealing with. And we're developing these adaptive and flexible environments that can use, as we're all saying in this video is data, we're using that data to develop those analytical tools to really help, not only the federal side, right? Remember we're talking CISA we're talking, about the downstream impact where we can address our mission and as the nation's risk advisor with state and local officials and organizations. So we're really trying to develop those analytical tools to mitigate tomorrow's threat for everyone. [Chase Garwood] Great, thanks Gary. All right, Preston, your last thoughts? [Preston Werntz] This is where it's tough to follow Gary, right? Cause he pretty much hit everything I was going to say.
And the only thing I maybe I'll add to that a little bit is certainly, in order to avoid, as Gary said, chasing the latest buzzword, the latest shiny object, it's really being methodical, thinking through and doing that upfront design and collection and thinking about what you need at your organization, what you're trying to get, what you're trying to achieve, work backwards from there. Always think about the impact, work backward, be methodical, do your upfront design really think through it. And that will help you avoid chasing down wrong paths and staying focused. [Chase Garwood] Great, fantastic. Thanks Preston. Okay, Alex, let's bring this full circle in your last thoughts for the session.
[Alex Phounsavath] Thanks Chase. So CyLab isn't one and done. It's going to be an enduring capability for systems missions to benefit from innovation and in the field of analytics, the players, the landscape of players in products changes in months, not years.
And so we're going to be creating an environment where, although the threats are changing and evolving, so will the capabilities that CISA has to address them. [Chase Garwood] Great, thanks Alex. Well, thank you all for your time today and your thoughts. And hopefully we didn't go too technical. We could spend days, weeks on this topic.
I do want to also stress again, this is S&T RD&I, but the importance of partnering with our mission space, our mission component with CISA, we couldn't do this as quickly or effectively without both teams, both organizations, and for a lot of folks that don't understand that, importance of having the operational mission folks side by side, this is a case study for how it should be done. So again, thank you. I wish we had more time again, we could spend days on this topic and in our fellow organizations, we often do.
So there will be a followup slide for our audience to learn a little bit more about S&T cyber and analytics research investments in our DHS showcase programming. So on behalf of Gary, Preston, and Alex and on behalf of CISA and S&T, thank you for joining our DHS whole government R&D showcase. And thank you for participating. Thanks everybody. (Upbeat Music)