How Azure and Databricks Enabled a Personalized Experience for Customers and Patients at CVS Health
Hey everyone and welcome to spark summit, uh my name is michelle, un and today we're going to be talking about our journal, journey, of bringing personalization. To customers, and patients at cvs, health. I've been with cvs. Just under five years and currently lead, the data engineering, team for the pharmacy, personalization. Initiative. Where my team supports, the productionization. Of machine learning models to personalize. And improve the experience, for retail pharmacy, patients. Prior to working at cvs. I worked in the the public policy and nonprofit, sectors. But always with a passion, of. Using data to solve challenging, problems. I'm here i'm here today with uh. Who is raghu to also introduce, himself. Hello everyone. I'm very excited, to be here my name is raghunaka. I lead data engineering, team for cbs, retail. Serious health retail friendship, business. I have been with serious health for the past seven years. And before that i worked as a consultant, for large healthcare, and financial, organizations. Working for cbs, has been a very satisfying, experience for me personally. I like the fact that. I can, do what i love which is working. On innovative, technological. Products, using cutting-edge, technologies. Uh to achieve our company's. Healthcare, mission. Which is helping people on their path to better health. At cvs, health. I build data platforms. Cloud infrastructure. And machine learning at scale. I typically. My typical day would involve. Solving. Challenges. Of handling, big data in health care space. Today, we're going to cover, simula, serious, personalization. Journey. And the lessons, we learned during the process, and challenges, posed by. Constant, growth, and ever changing, technological, space. And finally. We're going to give a sneak peek on. How we are going to tackle these, challenges. In future. Using. New tools we are currently exporting. Exploring. To strengthen, our data architecture. Great so before we jump uh two into the personalization, just want to give everyone a little bit of background, on cbs health as an enterprise. Um so cvs, has a diverse set of assets, um all, all uh focused towards driving, uh our mission of becoming, a healthcare. Uh innovation, company, uh you know with the goal of making quality care more affordable. Accessible. And simple. For customers, and patients. And so at the core of that um you know we have advanced, uh analytics. To help drive and improve the customer experience. Um across the enterprise whether that's from cvs, pharmacy. Uh the pbm, care mark business, minute clinic, or aetna. Um so today's focus for the presentation. Will be on the front store and pharmacy. Uh businesses, both raghu and i, support uh. Support on the pharmacy, and front store businesses, we have about 10 000, retail locations, across, the country. Um, and so our focus will be on you know the journey to drive personalization. Within, within. Uh, those areas. With all those uh business, lines. That uh mitchell has, misha, mitchell has mentioned, in, earlier.
It Poses, a. Unique set of problems. Uh, to. To a machine learning project, of any kind. So. For, any personalization. Project, understanding, the customer, behavior. Is really the key. So. Understanding. Serious customer, behavior, is really unique. So we have unique set of customers. Because, we have different. Large number of micro segments, which, which poses challenges to understand, overall behavior, of the customer. Because we have, like michael mentioned we have like 10 000 stores. Across the, across the united, states. And we have. Uh you know. Pharmacy, patients, health needs, are. Varied, and sometimes, unpredictable. So. All these challenges. Uh, make our customer, really unique. And our data is unique too. We are not dealing with a typical grocery, store customer. It, is really hard to predict. A customer, behavior. Uh. Of of a convenience, shopper because, a customer, could, go to a convenience, store. Just because he forget, uh forget, uh, milk, or he could, go to a convenience, kind of convenience, store like see he has to. Pick up his prescription. And, stop by for to grab candy so, it's really the predictive. The behavior. Could be unpredictable. Uh which. Which is why our data sparsity, and dimensionality. Leads, into, overfitting, issues for any kind of machine learning model. So, and, coming to the third, uh unique, i guess that i would like to point out is this situations, that we, typically, deal with in a retail trend store. Uh for example. Uh, the kobe, situation. Uh, will instantly. Make, or has instantly, made our static machine learning models, uh, either outdated, or invalid, because. The customer, food traffic. Has significantly. Decreased, or their purchase, pattern, has significantly. Changed. So all of the historical. Behavior. Uh, that, that that we. Use to, uh to. To model the customer behavior, is completely, useless. So these are the, these are these, uh. Three. Uh unique, uh things that we would like to you know point out. Um, and uh. Before, before. Before we jump into the personalization. Uh, goals of cvs. We, thought it is really. Important, to understand. Uh the cbs customer, b. When it comes to, you know personalization. Goals of cvs. You know see a personalization. Is delivering. The right, individual, experience. In the right channel at the right time. So. We have. Uh, we have to provide, uh the customer. Experience. Uh, we have to provide the right experience, to the customer or patient. Uh whether it's, whether they, whether it is a pharmacy, product. Or whether, whether, it's, whether. It is a, reward, or coupon that we offer, or whether, whether it is a, minute clinic service, or health, health club, or whether they are enrolled. In an engagement, in a loyalty, program. Like extra care, card. So, it's all about providing, the right experience.
And, It is also important, for us. To understand. What channel the customer. Customer would respond, better, whether it's a text message, or a phone call or, you know a an an in-store. Offer. So, so, optimizing. Our, our, uh. Process. To, to define, to come up with the right channel. That a customer, or patient would respond, is also, really important, for us. Uh, and. At the the timing, of the offer as well. Uh right so if. A, patient, uh, could. Could be up for a. A. Prescription, renewal. And that could be a right timing. Uh to send an offer to the customer rather than, uh the customer, already filled. A three-month prescription, and there is no, reason for the customer, to have the trip. Right so this is one, example, of, you know for the significance, of you uh the right timing, for the customer. And then the tired, message to the customer, is also important, to us or that is one important, goal because. Uh a customer, preferred, language, could be different, than what we are, interacting. From so we, we define, all these goals. And we use test and experimentation. Framework, which we're going to cover in a later slide. To, come up with these various, attributes. Uh which is going to. Help us define that 360, degree. View of a customer. So in order to support those goals. Where. He here is a quick, overview, of our existing, text. Text tag that we started building an error uh for the past, year and a half. Right. We have. Uh like any, traditional, company we have uh, you know traditional data relational, data warehouses. And we have several, disparate, files coming, over from. Pos, sales and, several other places. And we we, we have used this cloud technology, to ingest all of the data into, uh into. Cloud, in this case our, entire workshop, is in microsoft, azure. And, we our entire workbench. Runs on databricks. Uh. Uh, of course which is on spark. Apache spark. And. We use, uh airflow. And uh gitlab, for our, orchestration, and devops, as well we we do use tableau, for uh reporting. Uh. You know both both operationally, and financial, uh and, uh, adherence, metrics. So. As you could see on the top right corner, our crossing, layers are very, pretty, are pretty. Typical, to. Our typical, other typical machine learning. Uh project, where we create a unique data layer. Uh that would, you know feed into, a. Feed into. A free grade, a process, where we're gonna create our features. And then we're gonna create opportunities, out of it which are which, which all uh which all can be fed into the machine learning models, and then, uh we're going to send these, offers. To to the pos. Weekly, daily based, based on the. So cadence rago mentioned, um our personalization. Journey, started. In mid 2018. Uh you know where initial use case development, started. We were able to, build our first pilot on-prem. Actually. Hadoop, environment, uh with within just a couple of months of that first uh, that first ideation. Um launched our first uh personalized, offers to one percent of customers, a few months, a few months later. Um, we quickly, hit um some roadblocks, when we tried to scale up from there. Um from one percent to five percent, um there's actually. A, an actual. Constraint, of, being able to build additional hardware to support the scale that we wanted to get to, um so at that decision, um we made the you know at that point uh we made the decision to transition, to. A cloud-based, environment, using azure azure data breaks as rob had mentioned. Um so it made that transition in early 2019. And from there we were able to you know expand the number of use cases that we were able to support, double the number of personal, personalized, products. And then eventually, scale to offer personalized. Uh personalized, offers to patients. By the end of 2020.. In parallel we were also, um you know the focus. Was, specifically. On, you know building this, test and experimentation. Framework, so that we could you know, rapidly, iterate and test lots of, different iterations, of um, giving different uh, experiences. To patients based off of, um, the you know the patient segment just kind of iterate, over, incrementally, make improvements, um to the with the experience, we're providing to to patients. And customers. And so as you can see um you know this red line kind of shows the size of the team we started off with a very small team of just a handful, of data engineers, data scientists. Um quickly, as, they use great use cases, and a, number of products that we were offering, through, personalization. Application, grew. As it are team size. Luckily, um. You know cloud-based. Infrastructure, is able to support and help us to scale up and we'll talk a little bit more later in the presentation, about you know what that looked like. And so just to give a little bit more context, to, um, you know kind of what. What personalization, really means at cvs, i'm going to give a quick overview, of some of the you know use cases that um. Use cases and you know solutions, ways that we approach. Personalization.
And Then we'll dive into. Two specific use cases one on the pharmacy. And then one on the front store side. Um so. A couple, couple examples, here of just uh, which i think raghu had mentioned earlier so, um you know. You know we have many of these you know clinical products, uh, just as you know we might have many of these coupon offers that we can offer to extra care patients. And really so it's a scarce resource that we want to prioritize. And so. Um you know what better way to prioritize, than using, you know machine learning to to. Determine the propensity, of a patient's likelihood, to accept a certain offer. Um you know but to start um actually you know the offers that we were sending out were very, uh rules driven so everyone got the same message at the same time. Through the same channel so there wasn't actually a lot of data that we had at our disposal, to kind of make those predictions. So started off with you know randomization. Um. Testing, uh you know testing different uh, channel assignments different timings. Uh so that we can you know collect information, about patients preferences, and their behaviors. Um then from there you know move, to to support both, kind of from an experimentation. Perspective. But also from a. Machine learning perspective. So, you know, we can use the data that we've collected, to to implement. Different, experiences. Or different experiments. Um, you know segmenting, off different, populations. Uh, you know maybe, you know age. And gender, or you know, age and engagement with channel. Um, and then customize, the experience, within, uh kind of those those, segments that's kind of the concept of experimentation. That we're trying to drive. As well as you know as we're. Able to collect. More and more information, like uh you know. Can use uh some more uh you know, uh, advanced analytics, like. Machine learning propensity. Models, as well as you know, looking for you know. Really to, find um. You know, opportunities, where we can uh. Like uh, provide an offer that uh you know where the patient is likely to to engage with. And so some of those some of the you know, outcomes that we're trying to drive. Really are centered, around, health, healthcare, and. At least from the pharmacy, side, around medication, adherence. Which i'll speak to in the next, slide. Um. But you know. Goal is to you know, increase engagement, with, products that we have. But also to just create a better experience, for patients. Through, both the presto and the pharmacy. And so to go into a little bit more depth into, kind of, one example. Use case, within the pharmacy, space, um as i mentioned. You know one of the the problems that the pharmacy, is you know well situated, to solve, and and to you know kind of support patients, with is medication, non-adherence. So cvs, has different, you know products that we can products and services. That we can offer to patients.
That Help, with uh you know barriers, that patients, might face to medication, adherence. So barriers might include forgetfulness. Or, cost, or. Access, especially. Now like with with cobin, um any other. Examples, of those services, could be you know reminding, a patient, to. Fill their medication, or reminding, patients to pick up their medication. Um. Providing, counseling, when we, we might see indication that the patient is uh. Is having side effects and has dropped off therapy. Proactively. Like uh, offering to refill. Or to get new prescriptions, when a patient has run out of uh you know fills on their prescription. And also like in times, right now with covid you know offering. Uh. Delivery, or even you know free delivery. On, medication, is a great service, um, for patients that might have some kind of access. Barrier. Um and so. The way that we're able to you know. Understand. So goal is to first understand, the, um, you know the patient's profile, so what is their. Um adherence, profile, are they, truly, a, non-adherent. Patient, or are they. Um. It's possible. Or are they truly, a non-inherent, patient or is it is possible to, uh, possible for them to have uh you know maybe their, prescriber. Or maybe they no longer need the medication, or maybe their prescriber. Uh gave them a you know a lower dosage, and so they don't need to come to the pharmacy, as often, so there was those are all kind of you know conditions, in our in our, uh in the data that we can, leverage to understand. Better, um kind of like the timing of when patients. Need to come to the pharmacy, so that we can, better prompt them. Um. In the in the way that's most meaningful, and relevant, for the patient. Um, and then i think. Once we know for the patient you know the the. Kind of uh you know service or uh intervention, that they might need. Um. You know we also want to package, that for the patient in a way that that makes uh the most sense to them so things like, um, choosing the best channel that we can reach out to the patient. Um like roger mentioned, um. Like customizing, the messaging, so that we're, uh. Like uh, you know. Reaching the patient in the language, that they prefer. Um. And using content that's um, you know going to speak to the patient and really help them to engage. And then a lot of this work has also. Just been about kind of streamlining. Um. Creating a you know, a longer term, uh. Uh. You know approach for streamlining. Uh and offering, a more omni-channel. Experience, for patients so. We partnered very closely with uh the the it systems. At cvs, as well as, the, delivery channels, to ensure like uh to ensure a more streamlined. Uh, you know, delivery, system. As well as uh you know increasing. The personalized, content, through these channels. So, uh here is an example. Of a typical, retail transfer, use case right like we give we send an offer to the customer, through various channels that we have covered before.
And That influences, the next trip. To the to the store and uh. You know the customer presence that offer redeems, the coupon, and then, also expands, the back uh basket size during that trip so so this is just a typical. Retail. Uh, customer. Uh, journey. Um, and. The. The problem statement that we have is like you know we have, uh, you know. To grow. Growing the engagement. Of, of our most valuable, customers, is really important. Right, for any retail organization. Um. And, re-engage. The labs customers. And, engage. All. Active product shoppers, is is, really the core of the problem right that that that's the. Uh that's. That's what we would like to solve using the personalization. We have built so, uh the solution, that we. Uh came up was like okay, we need to personalize, the communication. And provide relevant, and most exciting, offers. Using the data, from the customer, profile. The 360. Degree view of the customer profile that i talked about earlier, so how do we do this right so we first. You know, track the customer behavior, for the past one year and you know based on the transactions. Like we understand, the purchase, behavior. Based on the brands, and categories, that the customer will purchase. Already. And we figure out the customers, affinities. To various, other brands, and categories. And identify, their, recent, purchase, pattern. Um. On, on those, uh. On those if on the on the identified, affinity brands earlier. And. Evaluate. Uh you know, what offers. Exactly, matches, our correlates, with the customer, behavior. That we have predicted, uh in our earlier steps. And, and finally. We take all of the data and then we use that to identify. The customer, probability. To buy a, particular, product. Or to redeem a coupon, using various. Uh using various machine learning algorithms. So that's really the, core, of a code of our personalization. Engine like to uh to predict the, behavior. Of the customer, and then. Make our offers more, relevant, and more personable. Personable. Personable. So. We we started. Using, the. Linear models like legislation. But, uh due to due to the performance, issues, which we're going to cover in next slides, we switch to, extreme boost models. Which are non-linear, and uh actually performing, really well, and finally we optimal we take all of these. Uh outputs, and we optimize. Uh we optimize, based on the constraints, we have uh, right we have several budgetary, constraints, and, uh obviously, we cannot, give unlimited, coupons, so, so we optimize, that. Accordingly, and then send out those offers to the customer. So typically. Uh, uh. At, cvs, we don't, measure, our performance. Using. Uh. We, using sales metrics, or anything, we measure our. As a health care. Focused, company we always try to measure our performance, using health care relations, metrics. And, you know um. We saw a. Huge uptick, of one point six percent in overall, uh, improvement in. Adherence, based on. Uh you know. Various, analysis. Uh that, our measurement, team has performed. So. So yeah we we always, evaluate.
Our. Our performance, based on health care relations, metrics, versus. Versus, any sales goals. So now that we've given you an. In-depth, look at personalization. At cvs. We'll use the next couple of minutes to talk about. Um the growth that was enabled by the cloud-based. Uh you know cloud-based, environment, uh and, transition. As well as some of the challenges, that we've faced along the way. So first as i mentioned. Kind of the the personalization. Journey. We made that transition. To. Cloud-based. Uh. Environment. In 2019. And i think. As i had mentioned you know that that, transition, definitely, helped to, enable us to, both get to market quickly. But also, expand. Uh expand. The application. Uh. As we as the use cases grew so. Um, you know, through azure data bricks you know we have the flexibility, to, you know spin up clusters, that you know meet the the unique business needs that, you know unique needs that we have, to support, various business use cases. We're also, you know not restricted, by the physical, hardware constraints, that i had mentioned. And we really. Don't need to spend as much time, worrying about uh you know tuning as we did. Where we were you know very constrained. Um. So also just i think another thing to call it is just the you know ease of use so, um. Databricks. Uh, in general, uh you know, centralizes. Uh all the assets, that developers, need to make their their jobs, easy so. Putting you know interactive, notebooks, in in one place. Cluster management, performance, monitoring. Uh data metastore. Uh really makes it uh easy, to, um, on board. On board and quickly grow the team. Um as well as it just makes uh you know. Supporting. Uh. Supporting, growing use cases, um that much more easy. Um, also just a you know, less of a focus on infrastructural. Support, and more focus on the work itself. Although. You know obviously with with um the growth that we have experienced, over the past couple of years um. We've, certainly encountered, many challenges, along the way, um many of which we continue to still tackle to this day so. Um even though you know we're, uh, you know. Have transitioned to cloud-based. Environment, like they're still. Always going to still, be challenges, with handling, big data so. You know the team has explored. Different optimizations. Like, delta. You know partition, sizing. Really focusing, on optimal, cluster usage. Um. And we'll you know continue, to. Push forward in that regard. Um, second area where you know uh, has been kind of a core focus of the team, has been around cost management, so as the team has. Grown quickly, and you saw in that graph um, as have our, cloud costs, um and so, uh you know there's a lot of different, uh, things that we can be doing, on our side. And and you know azure and databricks. Help to enable some of those things so, you know we can spin up different types of clusters, for different, types of jobs. Um you know your etl, jobs can use different, uh, cluster, and different amount of compute capacity, than your model training. And your feature creation. Jobs, um, there's also just a lot that we can do from a developer, standpoint. So. How do we promote best practices. Within our you know development team in terms of. Like best practices, of code optimization. Leveraging. Uh sample, data, environment, which we've created for the team to use as well as you know, maxing, out, you know when we're you know auto you know we have the capability, to auto scale our clusters, but, you know really, trying to max out the cluster usage. To the best of our ability. Then there's also just some great features, um. Provided by databricks, and continually, added so things like cluster policies, and pools. And different types of you know jobs versus interactive, clusters, that we want to continue, to explore. To really, make sure that we're making the most of the um. Making, the most of the environment, and features that are available, to us.
And Then you know another another area, um. Of, that has been a challenge. Um has been just around the the evolving, nature, of the technology, itself, so you know while. Cloud-based, technology, has been around for many years. Um you know it's still relatively, new within. The healthcare, industry, and you know cvs, specifically. Um, so, you know while we always want to you know, push push the boundaries of innovation, within, cvs. We still have to you know. Take into consideration. You know security, and compliance, and uh healthcare, regulation, all of which are critical that we, kind of uh. Meet, guidelines. With so, you know. Wanting to you know. Test and try new services, but. Making sure that we're doing that within the constraints, of. Kind of the environment, that we're in. Um and then one one final, kind of challenge area just, broadly, and they probably will jump into a little bit more around some of the challenges, specific, to machine, machine learning, journey, itself. Has just, been around you know, finding, talent, and uh continuing, to develop, the talent, on the team, um so you know with the technology. Being relatively. You know, new and constantly, changing, it's it's difficult, to find um. You know difficult to find, find people to join the team especially when we're growing so quickly that have, you know that cross-functional. Skill set of, you know data science data engineering but also. Kind of understanding, of devops, best practices. And so you know we've. You know, been working to, um. Both, uh you know. Grow the grow the you know training opportunities, that we present to the team but also, uh give more informal, opportunities, to just, play around with new technology. Test, and also share across the team so as you know, uh roger's team is you know focusing, on some of, some uh, she'll share some of uh you know. New services, that uh within, azure or you know, uh, other types of tools and technology, like sharing that information, across the teams to help uh kind of, cross, train uh and scale everybody, up. Thank you michel so, so while we had our own challenges. On. In terms of growth and performance. Uh, we, also have our own. Set of unique challenges, uh, during our machine learning journey which i'm going to cover quickly. Uh in this slide. So. So. I, i would like to, like uh highlight, four, areas, where, uh you know we had our uh you know our. Own set of challenges, and how we overcame, our what kind of solutions, that we were experiencing so that it would be helpful, for. Uh other organizations. Uh who are actually. You know starting, their. Machine learning journey. So, in. Terms of when it comes to, future engineering. You know we have, our, uh, our. Production pipeline, is very feature engineering, heavy. And, you know complex feature engineering requires, a lot of computing, power. So. Uh to have a center. Centralized. Features. Uh which can be leveraged, across, many use cases it is going to be really really helpful. Uh, and uh. You know, otherwise, you would be. Uh using, uh you would be. Reasons, uh same features, are using. More feature. Using more than needed features. Uh which could lead into overfitting, issues. Uh and in. In model training area, like i mentioned like we have a lot of. Segments. Uh micro segmentation, and, these disparate, use cases, require, many models to be, you know, trained. Uh. And implemented, parallelly. So. So that would require. Uh you know the, robust pipeline. Uh and we cannot, generalize. Uh these use cases, so so. A robust production. Pipeline, is required, as well. Uh which we, we were using, uh database, jobs pipeline. As a production pipeline. And then when it comes to models, uh selection, process, two we usually, went with uh you know for example, we, went with k-means. And logistic. Regulation. Um. But however, we quickly realized, that, you know logistic, regulation. Doesn't do well, uh, with sparse, data and leads to overfitting, issues, and specifically.
With Cayman's algorithm, like you know clustering is not suited. To solve. High dimensionality. Problem. So, so we like i mentioned earlier we quickly switched to non-linear, models, uh like actually boost, uh, in terms of you know. In terms of implementation. Tuning and, pipeline, integration. There are like, various, challenges, that are uh. Involved. Uh that, that. That we were able to solve. Uh and then in terms of implementation. Um we have a. You know, a very, uh, initially we went with very manual, process and we uh quickly. Uh started exploring. Ml flow and q flow to to be able to. Manage your machine learning life cycle. So, so. Implementation. Is really, key. As you as we grow. As a team. And when it comes to collaboration. Now one of the things that we have quickly identified, is that, people always. Think like you know data scientists, and data engineers but there is another role in between which is, uh the mlrs. Role, which uh, well we which we quickly identified, and increased the cross training to, to fill that gap as well which has clearly helped us uh scale. Much more quickly and bridge that gap between data engineering, and data engineer and data scientists. So, with with those, challenges, in mind i would like to give you, a quick, uh, sneak peek on like what we are. How we are solving. The challenges, that i have just explained. Uh in future. So, uh. We. So. We are. We, we quickly, realized. After, reviewing, our challenges, and uh the way. That our, pipeline, was performing, in our growth. When we observed all of this we quickly realized, that like we know we need to. Manage. And use the right tool for the right job. Uh in other words, so one. We have a lot of. Tools that we were exploring but i would like like i would quickly point out, one, uh huge change. In. In thinking. Which is. Uh, um we're going to switch from, using, cpu, based, uh. Machine, learning, to gpu based machine learning, uh so we're gonna use. Gpus as our compute, resource, for. Uh both of our uh training and influence and we're exploring. Uh rapids, uh and then another. Uh area that we're exploring, is. Kubernetes. As well. So with kubernetes, orchestration. We could use multiple, tools. And integrate, multiple. Uh tools into our production, pipeline, without. Being boxed into one one single, tool, so that is another, area, that, we were exposing. As well. And from the pharmacy, side um you know as there's, um. You know as as the use cases continue, to evolve there's an increasing, interest, in, enabling. Some. More real-time use cases, so, we've been exploring, things like azure event hubs, kafka. Azure streaming analytics, azure functions. To try to you know enable. A more real-time, uh you know personalized. Approach, for patients. Um. So i think, with that um. Uh, we hope that you uh we hope that this presentation, was helpful in just providing, a bit of context, into how, you know personalization.
Uh How we think about it uh, through the you know, the, health care and uh retail lens at cvs, health. Um. And uh i guess with that we all uh happy to take any questions, and uh thank you everyone for for. Joining.