Creating a modern data platform Data modernization and driving value

Show video

foreign [Music] really excited to have you here hopefully there's not a post lunch slump if anyone falls asleep I will totally call you out so I'm just going to warn you right now this is interactive I appreciate you being here and I'm really excited to talk about what it takes to create a modern data platform my name is Danielle Behringer that's really awkward with a huge picture up there so I'm just going to go to the next slide I am a managing director at KPMG in our data management and Engineering practice prior to KPMG I was the chief data officer at a major Automotive for six years that's like dog years that's a really long time to be a chief data officer but my background is as a data engineer and software engineer across industry and really all that means is I'm a huge data nerd and I'm really passionate about how the market is changing and the ways that we can reshape our future for people who want to do more with data I want to start with a definition of data modernization this is a definition that I believe is a huge change in how we think about the way we use data for Value it's a strategic approach for Innovative use accessibility governance operations and doing more in a modern Cloud architecture through people technology and process transformation first things first by show of hands how many of you are ready today for some data therapy yeah I know it hurts unfortunately a lot of people aren't even self-aware that they need it so what I wanted to start with because no presentation at this conference would be complete without saying the words generated AI I want to start with what are the what are the prerequisites if we don't create a modern data ecosystem and go back to some of the basics of the unsexy topics like data quality data governance oh I know it's not a great term but it really matters so this journey that I'm showing here Horizon one two and three is about starting with some self-reflection which is why I call it data therapy how do you assess your environment your portfolio the data maturity of the folks that are working every day with data potentially in spreadsheets and they have no idea that there's a better way you're trying to build Cloud infrastructure deploy application workloads and modernize your data warehouses and data Lakes but without doing it in Partnership between the business and I.T it often fails miserably so if we return to some of the the heart of the matter as we prepare to do generative Ai and to get more business value out of our data I want you to think of this as a prerequisite the topic of data modernization challenges everyone's got them I work for a Professional Services firm if people didn't have challenges we would have some problems right so if we look at the top challenges we're seeing in many cases it's technology proliferation and the challenges for data modernization are never about the technology at least not most of the time they're about the people and the process they're about the adoption I like to say that the presence of Technology does not equal success but adoption does and it has to be sustainable so when we look at things like organizational resistance data verification incorrect integration some people call this like the integration hairball it's a real thing right how do we move past that how do we start to take people on the journey with us as we're modernizing as they're maybe going through a really long painful Erp modernization or a finance transformation how do we find those opportunities with data to give them something satiating while they're waiting for a really long road map that takes a lot of time and is very complex so there are six things here that we see as success criteria when we work with our clients when we see them changing their organizations creating a data culture creating data literacy but most importantly having some very hard painful conversations about data minimization and Technology simplification we do not see organizations who have Greenfield nothing they have a lot they have business intelligence tools visual analytics tools data wrangling tools data science tools open source multiple clouds and a lack of integration consistency and certainly a lack of operationally sound sustained data ecosystems so though I'm mentioning just in short form business vision for data value right do the business stakeholders believe that the data ecosystem can support their vision for what they want to accomplish in the business whether it's a sales and marketing division that wants to do more and maybe insource some things that they've given to agencies and partners maybe it's a supply chain team that has zero visibility to data in their manufacturing plan or anything that's actually Ingress and egress out of that plant to be successful with supply chain a modern layered architecture can be extremely intimidating to people that are not in the immediate perimeter of I.T so how do we show it to them in a way show them a portfolio that is based on business value not just technological definition or a bill of materials of 52 cloud services that they have to assemble in order to make a widget right data governance and literacy of course is extremely important I actually think data quality and data governance are having a Renaissance because when people went through gdpr and CCPA and they realized that 72 copies of their customer database lived around the company and Joe every Thursday was sending the customer database in an Excel spreadsheet to a group of people they were horrified but what it did was it uncovered an opportunity to spend more time on metadata management and more time on a data catalog definitions a data dictionary and certainly some data minimization so now here we are we want to pursue generative AI everyone is super comfortable with everything prompted the expectation for clients is absolutely changing they expect it to be easier they expect to use natural language to accomplish a task that previously before required a lot of different steps in data wrangling so Envision in a modern data architecture some of the things that are on the right but fundamentally changing how people work I like to say if they were spending 30 hours a week data wrangling and cobbling things together what would it do if that became two hours a lot of people worry that they would lose their job they would be repurposed I like to think that they get 28 hours back to work on things they wouldn't have gotten to for two years they are pulling ahead all of those valuable activities all that value-driven consumption of data because of optimization of the environment a deeper understanding of the data and an infrastructure and services that make it easy and self-service for them but also protect privacy and risk and security part of the data therapy again self-awareness exercise it goes from left to right the Baseline activities on the left I am absolutely certain most of you have this in place today in your organizations traditional data architecture logical and physical data modeling you know standards reference data these are the table Stakes to be doing data and analytics now how many of you have ever played Super Mario Brothers or have children or family that okay good yay okay this is just like Super Mario Brothers you don't get to go to level five all the way on the right until you pass through the levels prior it's very important so most people find themselves somewhere in the middle they have visual analytics tools they maybe have a manually derived data catalog they are starting to build a racy whether it they have a cloud center of excellence and maybe a data center of excellence maybe they have a hybrid model where itis is providing certain data services data engineering data science is normally Federated and is found in the lines of business and everyone is starting to work together but to get to the items on the right really takes a complete transformation in the way that people are working together their understanding of data as a product and I'll talk a little bit about that today and then also taking full advantage of the cloud native Technologies you shouldn't really be building anything that you can do with the amazing changing environment of the providers certainly Google and others that have capability that you can use and you need to really focus on making it prescriptive for your industry for your domain and then you can really lean into Data product life cycle data monetization privacy engineering and considering interesting ways to you gen use generative AI like synthetic data generation right those things require a foundation in the things that are on the left so this is the first part this is the assessment on the bottom I've given just some very simple examples in organizations we've encountered where they have push and pull between a chief technology officer a chief data officer Etc I was fortunate to have Enterprise architecture and Cloud at the same time that I was doing my role as a data officer and so in looking at that span it all came down to the people business stakeholders pilots who were willing to go first into the cloud environment who are willing to go first into doing data science and data exploration having a way in a venue for them to do that in a physical environment in a cloud environment and starting to change some of the philosophy about it or how an organization would make data accessible to people now the way that that can be done is it's often difficult if you have an itis organization that is driving everything around technology and never connecting it to business value the business stakeholders say well what's in it for me you're replacing my Erp system you're making me new use new tools but I'm still stuck with the same data that I'm wrangling in an inefficient way so we start the conversation now and say this is an example of 10 business value leavers I would Advocate that every single data program and project or product that you pursue you pick a primary business value lever and a secondary one and you directly tie it to and have a conversation with business stakeholders and say will you be a pilot I believe we can help you support Regulatory Compliance I believe we can help you have faster speed to Market can we partner up together and do this initiative seek funding look at the portfolio determine what accelerators what code what assets we have to actually make this a successful program together instead of I.T trying to entirely drive it or the business with Shadow I.T right

so the prompts on the right are intended to give people a chance to really think about the maturity of the organizational construct I know you're all tired of hearing people process technology these slides today and my messages it's not just about the technology the other two are actually what makes it sustainable the process and the people this is also about making difficult choices to Sunset we have a obligation to be stewards in our organizations for the financial rigor of our portfolio Cloud costs often are very underestimated so unless you have a phenops program or a budding phenops program or at Cloud Center of Excellence that's managing that and you're resourcing your tagging it can be completely out of control so when you look at it as a cost takeout which is you know again part of this value lever how do you tie the return on investment for a data program and also at the same time say hey we're going to Sunset these six things by putting in this new Cloud infrastructure some new Services self-service apis microservices data is a product a data Marketplace you have to tie the two together they can't be independent and then I would say lastly Persona LED design there is an explosion in the many types of personas in an organization you have data analysts data scientists data Engineers software engineers devops I mean there's many the the biggest thing that matters is when you think about the capability matching to the personas so if you want to implement you know a new set of sandboxes and data exploration you have to really think about who's going to interact with this what is a reasonable expectation of skill Readiness risk complexity for a data analyst in a business division to log in start using the sandbox start doing data wrangling with production data now I don't know about you but there are some security concerns you can't just be copying data and have you know it there has to be some structure and so when you start to make it persona-led design you can be much more prescriptive you can think about role-based access controls and the things that are necessary to convince your executive leadership to convince your security teams and your risk we will be fully compliant giving people better access to data to do their job has to be governed and stewarded and shepherded through a process where you completely rethink how people operate around data itself so in talking about the business in it coming together we did this illustration to say where is this business and technology meeting point we call this the bridge now this was originally sketched on a cocktail napkin I'm really excited at how lovely it looks right now maybe for fun I should include in my presentations from now on so you guys can see it but the point is the data therapy extends to business and I.T not always seeing eye to eye not always having a shared understanding of the value of a project and certainly the business has far deeper context for the way data should be used so I want to talk about data utility for a second foreign I want you to think of a grocery store you're in the grocery store you're going shopping and you can take anything off the shelf and you can look at the ingredients you can look at where it was manufactured and you know exactly how it's to be used right you're not going to go down the dog food aisle if you're hungry and take a snack and open the dog food like that's that's poor utility right you know exactly what that product's for and how it's supposed to be used so I want you to think of a data product that way you have to know the provenance and the lineage where did that data product come from what are its contents how is it supposed to be used who is supposed to consume it right then I want you to think about the pharmacy so the pharmacy you can't just walk up to the pharmacy and take something off the shelves you have to have a prescription and so a data product that is privacy enabled that is security enabled that is governed is like your own personal prescription it's for specific individuals with a specific utility in need right so again when we talk about this business and technology meeting point there has to be a contextual understanding for the data itself the data utility and all of the things that go with data management that I showed in the data management Spectrum the other thing is around time to value and metrics we see many organizations that have great success in setting up multi-cloud environments starting to move applications but they often set aside a few things to come back to later one is Mainframe that comes in second many times and then the other is their data and analytics ecosystem data warehouses right and part of the reason that they have a challenge with immediately doing that in the same prioritized list is because data warehouses start as a source for people to secure data that they need to do their job to use some business intelligence tools to do some visual Analytics to use that data for machine learning models to train machine learning models but inevitably what happens is someone has an idea and it's not a bad one to start tethering restful services to that data warehouse and it starts with one like we're just going to have one service it's just going to use a little bit of that data warehouse and then you turn around and there's 500. so you want to move your data warehouse to the cloud and you're like oh we have operationally valid live production web services that now depend on that so then you've got a whole second set of items that you need to migrate now business stakeholders when they hear this they are not pleased they're like what do you mean why it's the data warehouse migrate the data warehouse but then they realize that they are actually the consumers of the web services again transparent to them they might just have a web interface they're doing their job and now they're like oh so we have some technical debt and so when I say there needs to be an agreement between the business and I.T it is having full transparency of the portfolio of assets everything application code everything in the data ecosystem data products that are really important for defining how business should be run and modernizing and that's when we see that project portfolio data assets business context that's exactly what I mean the items in pink and on the right this is all the fun stuff that everyone wants to do we want to do data Ops we want to just have everything on autopilot so it's easier for data Ops requires humans you all know that now the good news is data observability metadata there is a proliferation of new services that do make it easier but it still requires resources to be dedicated to change management and to long-term care and feeding data Ops ml Ops finops I'm going to make an analogy how many of you have dogs or cats dogs cats birds anything okay I'm going to make an analogy that data Ops ml Ops finops it is like pet ownership it's forever you get the puppy so cute you get the food and the cage it's forever it's not I'm going to play with it for a day and then it's just going to figure it out so anything Ops please remember it's not just that let's say you do have the resources great we have 15 people we're going to immediately make them our Cloud apps data Ops ml Ops Team but there's upskilling there's obviously a challenge to find Talent and it has to be something that changes over time there's new Services every day there are new Integrations every day new data products every day so the care and feeding of that is pretty significant so just like the puppy who grows to be an adult needs more food there is this process that has to occur in your executive leadership need to know it because they may say to you well if it's Ops and it's automated don't you need less people well no we don't because as you're automating there's just more work there's more there's more resources there's more assets and you have to make sure that the business stakeholders and executive leadership are all in for a road map that is not just Innovation but also contains the change management and the operations so I'm going to get a little bit more into the technical part of the contents I'm going to start with overly simplified what do we consider the four pillars of a modern data platform it's very simple Cloud native self-service data as a product API integration I'm very certain that the majority of you already have these pillars in some semblance within your environments and the example that I'm going to use here is everything that you do should tie back to the business value everything that you do should tie to the personas that will interact in that particular pillar or across the environment and you should always think back to the bridge am I doing something that is technologically important but I haven't told any business stakeholders of the impact right there is always that dance between the business and I.T that is what sustains a modern native platform because the only way that your technology teams are going to be able to grow and flourish is if they are delivering business value and it's a very disconnected thing that we see in most organizations and it's not for lack of interest it's just not stitching it together and creating a cohesive community that meet in the middle of that bridge the bridge is made of data so metadata the data itself an understanding of your systems and understanding of the portfolio all of those things it becomes a self-fulfilling prophecy when you can show the value I had this challenge as I was building My Cloud and my EA team and our data engineering and data science functions I had to prove the value we were delivering so we created a business value assessment where you're measuring before during and after any data program any Cloud program the actual business value delivered and making it quantitative it's not fluffy it's not it has to be numbers use the data to tell the data story to justify why you are again wanting to grow wanting to innovate and have more control and more opportunity so this is a overly simplified but I think important what you know what do we consider to be part of a data modernization ecosystem you're going to notice right off the bat everything is a service um I'm not going to get into a dissertation about data fabric versus data Mash it's an and not an or so data fabric obviously deeply rooted in use of metadata knowledge graphs data mesh being Federated data governance I'm giving you the simplest definition I'm sure there's many more and you all know that but when we look at it all of the people in the middle was lovely little picture is really around people all going in the same direction and agreeing on the priorities for their organization Enterprise organization offering things as a service when it's appropriate and upskilling the workforce through digital fluency and data literacy so behind this there's a reference architecture I'm sure you'll recognize many of the things that are in here obviously everything from connectivity to data sources all the way through you know ingestion storage processing insights and actions this is prescriptive to your organization you may not need all of the things shown here in this reference architecture but a framework for understanding when and where you want to create an offering imagine a service catalog okay think like you know a shared services organization that has all of these things in the toolkit we often use this as a heat map and we give a maturity score to some of the individual components here in the reference architecture and say you know until you get to a certain maturity point we probably don't recommend you do you know X or Y and so this can be a gut check for an individual business area that's maybe very opportunistic and excited to do data science but you say to them and still you until you start to certify your data and have you know profiling and understand the quality of your data you really shouldn't start to advance into some of the other things you also have to have agreement there is only so much funding there is only so much time in people and the modernization ecosystem it's always best suited if you have a few Pilots so I'm a big advocate for that if you have areas that are ready to go and they want to lean in and they you know yes we want to invest our time and energy and be part of this modernization effort they become your biggest fans hands down they don't want to hear I.T talking about how great the new Cloud environment is but you get two business people that had a successful experience standing up a new Cloud environment and they have access to data and analytics in a sandbox and apis you can there is not enough money in the world to get something that good because the business stakeholders they get the momentum inside the organization to tell the story about how this investment was worth it why it's adding value and then you're giving them away with those 10 value levers to quantify it yes we did sell X number of widgets yes we were able to improve efficiency you know by a certain percentage and then what it comes down to is and you're going to recognize all the logos on here having something that is the best in class Cloud native service catalog right now this also means interweaving into it and I use a very specific example here those ml flow and accelerators you see KPMG ignite that's our AIML like we're making and eating or in dog food here right insert your accelerators your code your scripts the things that make you you in your organization your assets lay those on top of your Cloud deployment plane put a wrapper around them have a team dedicated to launching those you don't need to reinvent the wheel when it comes to the base Services because they're they're Best in Class regardless of the portfolio of services you chose your Enterprise architecture to it you chose the vendors you're working with your partners leverage those relationships do code development it's going to get you to Market much much faster and there are also some things here that are really important like having a certification process raw clean processing trusted whatever you want to call it Medallion architecture gold silver bronze everybody's got a different name for it a tiered architecture around curating your data products is absolutely essential but there has to be agreement what does it take to certify what is the criteria that has to be hit if you think about a data product and we go back to the grocery store example would you put something on the shelf that isn't top quality think about the ingredients think about the provenance of it will people understand it will they consume it properly foreign and then lastly the consumption so I've given just a couple exam this is not a comprehensive architectural diagram here this is just an example of use cloud native Services determine where you're missing anything specific maybe it's a policies code tool maybe it's a generative AI synthetic data generation tool perhaps it's some other industry specific service that you require for doing business insert those things make sure that everyone in the Enterprise knows how to use those services and make sure that the integration includes all of the same criteria you would have for your own data products and your own assets is there data observability is there monitoring is there alerting is it sustainable is it scalable is it cost prohibitive right all of those things should be true whether it's your internal solution your internal code base whether it's Cloud native or whether it comes through a system integrator a vendor Etc data's a team sport so it's also around extending that community that we showed with the bridge it's business I.T and

then there's like an asterisk on I.T it's all your partners and all your vendors and those that you're interacting with this is um just to return to the data product lifecycle there's one item I want to call out here and I do want to reserve a few minutes for questions there is an entirely new category of data products I call them derived data products they are anything that is the output of this certification process now this process will be very familiar and comfortable to anyone that does the software development lifecycle however for business stakeholders knowledge workers partners and others if they're not actively in the software development life cycle this may feel very uncomfortable to them they're like what do you mean I have to start doing proactive and reactive data governance sector what does that even mean I have to do a data quality scorecard I have to check it into the data Marketplace these are things that they won't be familiar with so show them Grace and help people understand that the data product life cycle or the data development life cycle it's going to be okay and just like they are accustomed to seeing applications development done and software delivered to them show that overlap show that then and say we're going to follow a similar process whether you are waterfall agile agile something else whatever the familiarity is for the methodology that you're following for delivering applications and services start to intermingle those data product life cycle components and then when you get to the point that you have a derived data product and that is two data products being used together the output of an AIML model right a new data product that didn't exist before maybe for the first time ever you're giving a business division access to data that they were never able to have before and they take that and do some amazingly creative things with it well that derived data product can be certified and it can be put in a data Marketplace and it can be shared the level of innovation and creativity is unlimited if you look at data products the right way and really getting into that curation process you're also empowering data owners and data stewards to finally understand that there's a definition for for these jobs that they've been doing they didn't even realize how important it was to the curation of data for the Enterprise and then lastly because again we've got to talk about AI a little bit I want to share another example of something that's a Cornerstone for how you operate in the future around a modern data ecosystem this is our responsible AI framework I think everyone should have one hopefully you are thinking about and forming the technology people and process imperatives in your organizations to do responsible AI I've listed some of the major themes I hope this will be useful to you but I've also listed at the bottom our accelerator categories again your own code and your people or your greatest asset they have the knowledge to create accelerators whether it be code Frameworks best practices run books leverage those people turn some of those things into assets that are reusable so you have one data science team that isn't completely Reinventing the wheel when there's four other data science teams in the company that are producing exactly the same assets so it's really getting that portfolio concept the service catalog concept intact don't just think of your service catalog as your Cloud native service catalog like think of it more comprehensively open the aperture on what you would consider to be important and start to socialize that one of the last things I'll mention is do a technology conference in-house so many people don't ever get to experience this business stakeholders they've never been to a Tech Conference they've never had conversations about exciting topics about how you can renovate and transform bring it to them do an in-house Expo do a data Expo where you invite everybody in the company to come virtually or physically and you open up the portfolio of all your data products and all your assets and then you invite your partners and vendors to come in and do training and you and you teach people the fundamentals of what is a modern data ecosystem look like how do we all participate in the team sport and then it really helps you because you're going to have Advocates that are going to help you Lobby when you want to prioritize certain initiatives applications development whatever it may be because then you're going to have a business person that says I vote for that too I'm ready to move my workloads to the cloud I'm ready to get rid of this old Legacy data warehouse I want a data Lake I want to start using services that help me understand my data better metadata is for everyone it's not just for I.T people right metadata is itself a data product that should be shared leveraged consumed and so in closing we're going to have about 10 minutes for questions these are the things that I think are top of mind um like I said we didn't have the luxury of going down the rabbit hole on every single Google cloud service I'm happy to speak with any of you about them I think the goal that I wanted to communicate was please consider the people in process part of the modern data ecosystem because you can build the services easily you can hook them up that the data deployment plane is very clear if you walk into the Expo there are hundreds of providers that have tooling but you have to return it to what is the business value does this simplify my technology portfolio does this help me LeapFrog some capability in the environment that delivers business value does it help me reduce my costs does it simplify or provide something really satiating while my company is going through a much longer transformation fill in the blank of whatever kind um but it also has to be driven on trusted metrics and kpis if you don't have a data scorecard that has metrics and kpis for how well you're delivering data whether it's being adopted consumed if you don't have metrics on what is the average time to Value what's the average time for people to acquire a data set what is the average time to insights how many reports do we have have cobwebs on them and skeletons hanging off the side because nobody's touched the report in 17 years right if you're not pruning and you're not keeping yourselves accountable with data metrics and kpis on a monthly basis I would suggest that there's an opportunity for improvement there and then certainly organizational alignment HR organizations are starting to partner much more closely with itis to redefine people's job descriptions citizen data scientists citizen developers low code no code open source all of these things have dramatically changed the types of Technology roles that are not just relegated to I.T they exist in the business they exist with partners and vendors and you want to make sure that you're retaining your people by actually adding to their job description the types of activities they're doing within this data modernization ecosystem and then obviously scalable infrastructure and services you wouldn't be here if you weren't interested in making the most of the latest and greatest offering but again it's one facet um I think it's incredible I'm so excited about some of the announcements that were made this morning and I have one last question before we take questions from the audience and that is is your data spurring people to action and if it's not I would really think about go deep into that data therapy what's missing why aren't people using all of these data products why aren't they using the new system we implemented are you spurring them to action and if you're not make it so that's it for me [Music]

2023-12-24

Show video