Where Does Your Data Live Repatriating Data and Data Security Intel Technology

Show video

(slow relaxing music) - [Narrator] You are watching InTechnology a video cast where you can get smarter about cybersecurity, sustainability, and technology. Here are your hosts, Tom Garrison and Camille Morhardt. - Hi, and welcome to the InTechnology Podcast. I'm your host, Tom Garrison.

With me, as always is my co-host, Camille Morhardt. And today our guest is Chris Royles. He's field CTO in Europe at Cloudera with experiencing complex systems, data and analytics, and extensive domain knowledge in information management, governance, security, machine learning, and cloud services. Today, we'll be talking with Chris about artificial intelligence, and why companies are repatriating their data from cloud to on-prem. So welcome to the podcast, Chris.

- Thanks Tom, really appreciate the invitation. Looking forward to the conversation. - We're talking about companies that are kind of going against what everyone was doing 10 years ago or something, which was they were moving everything to the cloud, and what's driving people to want to have their data on-prem? - Well, it's a great question. It's quite a wide number of reasons that comes up. I'm very fortunate in my role I cover the whole of EMEA, and as such, I speak with organizations in different industries.

So one day I might be talking to a global bank, another day I might be talking to a telecoms operator, and each organization will have maybe different reasons for the choices they make. A few years ago, many organizations were driven top down with an imperative to move to cloud, and a lot of organizations have since then matured in the way they think. So if we sort of generalize it, one of the foundational reasons, if you like, why organizations are repatriating workloads back from the public cloud into their own data center is they have a bit more control of what happens in their data center, and for certain workloads, they're predictable and stable those sometimes run better in the data center.

And so that's one of the reasons organizations are repatriating. When you then explore that, and start to dig into it, there's probably two other, really quite important aspects to it. One is the, if you like, the protection of the data that is associated with that workload. So organizations are being more focused on the telemetry around the workload, and how it actually operates, and then thinking about where that data the workload runs against should be sitting. And that bringing it back into their data center gives them more control of that, if you like. The other one is really cost.

A lot of organizations we speak to when they move to cloud, they originally thought of it as this might save us money. That hasn't always played out, especially for those workloads that run in a consistent and stable, reliable way already in their data center. So some organizations are making that choice. I think the more important aspect is having the flexibility, if you like, to repatriate if you need to. - Are you seeing it for certain kinds of workloads, or is this just across the board, like the pendulum swung too far one direction and now it's coming back? - I think organizations, as I said, they've matured, so they've had the opportunity to sort of experience what the cloud can do for them, and for certain workloads, if you take a a customer service example where you've got lots of customers maybe coming onto a website, and placing load on that website, and asking many inquiries, different seasonal events can change that.

By having a sale, say in a retail context you can drive more demand to a website, and so being able to sustain that website it's a very variable workload. Other types of workload, think something like regulatory reporting, for example, we've got a a known number of accounts, or we've got a known number of assets we need to report against. That's a very fixed workload in many cases, and so again, running that in a reliable way in your data center, it's not gonna change its profile, let's say. That's the type of workload we do see coming back into the data center. The other one is when there's a sensitivity around the data itself.

So, managing large volumes of citizen data, for example. So, you could very easily, as you know, certain organizations that manage a lot of citizen-related data. I use citizen instead of customer, because it's kind of a countrywide set of information, rather than cross-regional. That citizen-centric view is very interesting for some customers. I worked with an automotive customer, they did an automotive insurance as an example, and they worked across different regions, and so what they wanted to do was, what they actually found was they were creating copies of their environment in multiple regions in simple terms, and because of that it was getting expensive, and so what they did was they brought some of those workloads back into data centers in each region. So they could be managed at a regional level rather than managed in a shared cloud.

- Yeah, so I'd like to explore that a little bit, because we we've heard a lot in the news, policies, and regulations about where data has to reside, and certainly GDPR in Europe with citizen data and so forth, or personally identifiable information. It's easy to say. But I wonder just more generally, not just in Europe, but across the world, what is the trend, is the trend to have data not leave a particular country, or a particular region, and needs to stay, and then how do you manage that if you have a company that's dealing with customers all over the world? - Well, to my example, some organizations are approaching it differently. So, working with the European Bank recently, and they've taken the very clear distinction to build a private cloud. They wanna build in their own data center, and use their own assets. But when I asked them why they were doing that instead of moving to cloud, their response was quite interesting.

They said, "Well, we've gotta continue running a bank in simple terms, and our security team are always evaluating our ability to move to a cloud environment, but for now, we've got to carry on building. We can't just stop our business from operating." And so they'd already approached it several times, and found that there were reasons not to move to cloud, and so were building their own private capability in their data center. So I think what's more interesting is there's always change, and their response was at some point, our security processes might say it's okay, and we'll be able to make that transition. And the thing with regulation is it's changing. GDPR comes in into force, organizations then have to align with it.

There's new regulations around resilience such as DORA coming in around Europe. Likes of Asia-Pacific, there is new regulations being formed in that area as well in terms of the citizen privacy, and how data is moved out of region or across regions. And I think what's interesting with most organizations is the regulatory landscape will change, it won't stay static, and it will change for the right reasons. It's to protect citizens and data, and really consider the need needs of the citizen.

So an organization really needs to think in those terms and recognize that things could change outside of their control. I'll give you an example. We had Schrems II, which was a recognition, if you like, that certain social media platforms might be moving data between regions. And the question really was about, it is not just about the data moving, it's who has access to that data, and would the citizen ever be notified of that access request? So, there were unknown unknowns, if you like, about what was happening with the data. And that put a number of organizations we were working with paused many of their initiatives, and stopped to rethink.

And the focus really was on flexibility. As in, well, if we want to move some workloads to public cloud, why can't we do that? And then if we need to retain some workloads in the data center, why can't we do that? So really asking those challenge questions about where do we need the workload to sit-run, and what data is it gonna sit against, and what are the protections we need to put around that data? - I think one of the important emerging use cases in AI is this notion that collaboration can occur even across competitors for potential mutual benefit, and even including, like you say, citizens, right? So if we're going to do something like share insights to detect a health anomaly early, it's helpful if we can have bigger sets of data. So, how are you recommending people structure that kind of collaboration? - So good example with the federated sort of models, the machine learning, in the sense that a hospital can retain their data, and a subset of the process can run against that data within the hospital, and nothing needs to leave the boundaries of the hospital other than the model weights, and they can then be aggregated into a central model, that can then be shared back with the hospital is a good example. And what that means is that model in many cases can outperform individual models.

So, in some cases, by doing federated learning, you can create models that are, that outperform separate processes, if you like. So, in some cases, federated learning is an approach you could take, and can generate better results, because you're able to learn from different sets of information, and create a better model. - I wanna try to simplify just a little bit, just for my feeble brain, and try to bring it back to something that is definitely in the news a lot right now.

And that is this whole dust up that's going on around TikTok and the government, the US government's concern about personal information that a particular application might or might not be gathering. And I'd like to try to keep it, I mean obviously we're not in the know about this, so this is just what we're hearing in the news. But from your perspective, Chris, in the industry about information, and keeping it local in certain regions and so forth, how should we be thinking about this as a citizen? How concerned should we be? Is this a good fight for people to have their governments trying to keep their personal information local? You also mentioned the idea of not just where does the data reside, but who has access to the data? So, in that context, can you just bring it home for us with TikTok as an example. - A platform like TikTok, or other platforms that are available globally have to in many terms establish a level of trust with their users, because at the end of the day, those platforms exist because the users are putting content and material in. The platform itself, the organization that manages that platform might be quite small in terms of content. It's reliant on the users of the platform to put the material and the content in, and which of the value of the platform is derived from that particular data that's put in, The question then is where's that value going to? Who's using that material for benefit? So that definition between state media and business is the question that sort of rise around that.

And do we actually know, to your point, we're not on the inside of that, but what's the transparency around things like the data processes, and who has access, and even that notification as to who's accessed your data of what purpose? Did you give permission for that access? Good examples, my mobile phone alerts me if an application's using my geographic location, and my geographic location could be very useful to some organizations to understand both where I live, where I spend my time, which coffee shops I go to, where my office is, but the trips, and transport, and routes that I take every day. So I have the choice to say whether my application enables my geolocation to be shared with an app with another application. And that is because the mobile phone I've chosen as my device, if you like, gives me those prompts. I'm aware to some degree of what information's being collected. That's not always going to be the case. Certain devices and certain applications won't necessarily give you those prompts and alert you to that information collection.

And so it's down to the application provider to be very transparent in what data they're collecting for which purpose. And I'm not sure that that transparency is there in all cases. - We often trade sort of convenience for privacy. I mean, I definitely want my Maps application to know where I am because I'm asking it where I'm going. So it has to be able to track me, and I'm aware of that. But when it comes to something like a social media platform or platform where I'm posting videos about myself, I'm disclosing the information, and I'm just wondering, is it, have we kind of, speaking of pendulums, have we swung so far to this onus of being very, very careful about everybody's privacy, and putting such a burden on software developers and infrastructure providers and trying to, you know, regulate that as opposed to saying, "look everybody, if you're participating online, all of your information," right? Unless we're talking about a very specific private, I'm trying to play devil's advocate here, and understand how the other side kind of might look at this.

- I certainly think there's a degree of education. I've got young children myself, and they seem to be able to activate certain applications very easily. Social media applications, for example, are typically provided free. And so to block access to app stores for free apps is actually quite challenging. As a parent, you'd be surprised.

And so just education at a young age I think is quite important about how applications operate, and how valuable our own data is. And then what I notice as well from a personal perspective is it's the operating system and the device that notifies me that an application is doing something, it's not the application. Another good example, I went into a personal voice assistant recently, which I've got at home, and the amount of skills that have been added automatically in my household. And when I explored if there was a way to disable skills being added, there isn't. Again comes down to transparency. And as consumers are we given the right information to make good decisions? - And I know that when we prepared for our talk today, we sort of talked through an example too, that wasn't a social media example.

It was more of a, I would say traditional business example. When you think about airlines, they have the standard business, they need to do reservations, they need to take information about people, maybe their passport information, all of that kind of information. And then, the passengers then inherently fly, and they're gonna fly in this example outside the country and information, some information needs to travel with the passengers. Some information doesn't need to. And that just that complexity of doing business multi nationally like that. And then, there are different situations, like an airline disaster, or something like that, where now all of a sudden more information needs to flow.

I just wonder if you could kind of talk through that a bit for our listeners again. So this is not just a a hypothetical social media app kind of question. It's a day-to-day business concern and complexity that we all need to think about. - It's a good example. If you take an airline they have to collect quite sensitive information about passengers.

There's gonna be maybe three things to consider. One is the aircraft itself is a valuable asset. You are then gonna have the pilots that are gonna fly that aircraft, and the citizens that are gonna fly on that aircraft, who might come from different geographies in their own right. There's certain data that you want to be able to exchange, things like flight plans and things of that nature, as well as your personal identifiers. Things like your passport information might need validation and that might need to be shared.

And so there's certain data that you as a passenger would want to be passed across international boundaries. And I always think in terms of there being a a good path when everything goes right, everything works, and everybody's come to agreement that this is a good thing. You also need to take into account what happens if things don't go to plan.

You know, if the aircraft doesn't arrive at its destination for whatever reason, what then happens? And what happens with an aircraft is you'll have maybe disaster investigation. You'll have government entities getting involved in that process, and you'll have people from all over the world that might be passengers on that aircraft. And you'd have to take all of that into account, and there are policies in place on how information might get exchanged, but thinking about what happens when things go wrong is also as important as when things go right.

- You know, everybody was going toward cloud, right? It seemed that was sort of "We're going to public cloud, this is the great move. How can we get everything over there?" And now you're saying there's, you know, people are taking a closer look, and it's matured, and they're trying to understand what workload is doing what for what purpose and what are the potential threats. And then deciding which workloads might migrate back which ones might stay in the cloud, which ones might move to distributed learning techniques. Are we now all kind of with the pendulum in the middle just trying to figure out how to do it? - I think things are certainly settled, I'd say there's a maturity. We would always guide toward a data-driven approach in the choices you make.

So, if you want to repatriate a workload back to the data center, use workload analytics, use observability around your workloads and data to make the right choices. Always take a data driven approach to the problem. There are particular industries where cloud has been very, as in it's very consistent. A lot of organizations are moving that way. I'll give you an example, financial services very much are on that journey to the cloud.

They have very changeable workloads in terms of those workloads can vary quite significantly depending on seasons, depending on new offers they're putting out, depending on customer demand. But also there's a question around resilience. Then, as a lot of organizations start to move to cloud, you start seeing concerns around aggregation, cloud aggregation. What if all of our financial services institutions were all in the same cloud, in the same regions? What would that mean? And so, regulators are looking at cloud in that way as well. Not just which organizations are moving to cloud, but are they all doing it at the same time? Are they all moving in the same way? That could bring in unnecessary risks.

And so, it depends on your perspective. Different industries are responding differently to the availability of cloud. And again, it comes down to the workloads that they run and the shape of those workloads that they run.

- Well, Chris, I think we're at the end of our time here, but this has been a good conversation. It's a very complex topic, obviously, but I thank you for coming in, and talking to us about this whole data movement, and data privacy, and a lot of the complexities of the companies need to manage. It was a, it was a good topic. - Thanks, Tom. Really appreciate the conversation. - Never miss an episode of In Technology by following us here on YouTube, or wherever you get your audio podcasts. - [Instructor] The views and opinions expressed are those of the guests and author and do not necessarily reflect the official policy or position of Intel Corporation.

2023-06-04

Show video