Think Like a Technologist: Data Analytics and Decision-Focused Strategies for Global Health
so hi everyone i'm pablo sarmiento the director of engineering at genesis where we build data analytics platforms for global health and development uh in this talk i'll be going over the principles that differentiate a data visualization platform from a data analytics platform and providing strategies that you can follow to extract the most use from your data analytics tools and if you're someone who builds these types of tools such as a software engineer a data integration expert or a data manager this talk will also cover the best ways to design your software and data sources to optimize decision effective decision making so first i want to cover what makes this such a relevant problem to us at genesis our technology is all about delivering effective data analytics solutions to let our users confront the greatest challenges facing humanity to do so there's two core technical solutions we provide first we build a data integration pipeline to harmonize data from multiple different data sources in simpler terms this just means taking really messy data that exists in lots of different places and creating a single cleaned up view that can now be that can now be explored the second solution is delivering a front-end analytical platform that's usable by health analysts of varying technical levels so these two solutions result in a product that we call the zenith platform it's accessible via the web with an intuitive user interface for people to explore their data our experience building the software has shown us many times over that data-driven decision-making is integral to global health data-driven decisions are essential to responding to emergencies and natural disasters distributing vaccines and other health commodities equitably and planning health interventions that target communities most in need and ultimately saving lives throughout our work there's a critical realization that's driven our approach to building software and that's that software that uses or displays data doesn't inherently enable data-driven decisions or to put in another way data does not automatically lead to data-driven decisions to explore why let's dig into this buzzword that we all know and love data driven decisions in broad strokes the problem is that all too often people think that the leap from data to decisions is quite a small one they picture something like this where it's just a small hop from data to decisions basically thinking that data is all we need and once it's available we're not too far off from being able to make decisions when in reality it's more like this data needs to be turned into information and then into knowledge and then that knowledge has to power your decision making so the question then becomes what differentiates data from information from knowledge data is just raw numbers or text it usually comes in such large quantities that it's difficult to make sense of information is what we get when data is processed and presented in a way that we can make sense of it's a statement of fact something like this district had 100 cases of covet today knowledge is the result of connecting information together into a cohesive premise and conclusion this is often an in-depth multi-step and exploratory process it's not a static or routine transaction it can look something like the leading causes of maternal mortality is due to underfunded resources in our highest risk provinces causing our most populated districts to have stock outs of key antibiotics there isn't a single piece of information that led us to this conclusion instead we had to gather lots of different pieces of information synthesize them and come up with a brand new piece of knowledge that can be used in decision making in addition there's there's a certain permanence to knowledge it doesn't go away when the data changes it stays in people's minds and it stays in your organization so this leads to a compounding effect because knowledge continues to build on itself and in the end knowledge is what allows us to make the best decisions so when you think of software that you plan on using or software that you plan on building ask yourself where in the pyramid it falls when looking at software solutions that promise data-driven decision-making the most common problem is that they usually only look to display or illustrate your data data visualization is incredibly powerful but it ultimately misses the mark data visualization is only displaying the data as it is and so it definitely has its strengths it helps you understand the status quo better it lets you get visibility on the current situation but it doesn't on its own help you understand how to change the situation and making decisions in health is ultimately rooted in a desire to change the current situation so this means that software needs to help you connect the dots between multiple different scenarios in order to turn your data and then your information into knowledge and so you can then determine what actions need to be taken the ability to generate and manage knowledge is the differentiating factor between a data analytics platform and a data visualization platform so if your software can turn data into knowledge you're already much better equipped to make good decisions but you somehow still have to take that last leap from knowledge to actionable decisions and that leap is not guaranteed just by virtue of being data analysis software the way software is both built and used must take into account the decisions to be made at every step of the design and analysis process without a focus on decision making your tools will keep you in the world of data-driven information without taking that leap to powering data-driven decisions so if we want software that enables data-driven decisions there's two things we need to figure out and we're going to be exploring them in this order first how do we build software that turns data into knowledge and second how do we then design and use software to turn that knowledge into actionable decisions so what are the key features of a data analytics platform that lets you turn your data into knowledge what should you look out for i'll go through the most important concepts and show you how this takes us up that pyramid from data to knowledge and after that we'll be in a good place to take that final leap from knowledge to decision making if you're a software developer these are the key features that you should look to build in your platforms and if you're a consumer these are the key features that you should be evaluating your data analytics solutions with so first there's flexible queries this is all about the data layer of the pyramid we need to have ways to get the data first analytical tools should try as little as possible to limit the ways you can explore your data when i say query a query is just how we ask questions about our data and a good analytical tool should not tell you what questions you can or can't ask it should give you the tools to ask any questions which means that you should be allowed to run any queries you should be allowed to select any kind of data filter across any dimensions or metrics aggregate across any geographies limited across any date ranges etc this should it be misunderstood as necessarily providing you with a way to write sql or a way to explore databases the ability to write sql queries is great if your organization or team consists of highly technical people then that's probably perfect for you they can just write sql queries but if your team has analysts of varying levels of technical expertise then you need to look for analytical platforms that can abstract away the concept of databases or query languages like sql and instead allow analysts to think in terms that they understand such as indicators calculations and visualizations now we're moving up that paraben now we moved up to information so we're going to talk about the second principle us having a suite of dynamic visualizations if your analytical platform lets you query your data in flexible ways it should also let you visualize it in flexible ways an analytical platform that only presents data in tables is no different than just a database viewer being able to transform that data into a comprehensive suite of visualizations is what lets us turn data into information now the key thing to look out for as a consumer is that your data visualizations should not be static static images are fine just leave those for the printer as long as you're in a web browser your data visualizations should be seamlessly interactive users should be able to customize them highlight things click on different areas and drill down on specific areas of interest this is what turns a data visualization from a static illustration of data to an exploratory tool for an investigation which takes us to the point of investigation and exploration so now we finally entered that last layer of the pyramid we're in the world of knowledge knowledge is the result of connecting the dots between multiple pieces of information at the end of the day the questions that decision makers really have are rarely as simple as just what district has the most cases of malaria the big questions that they have are usually closer to if i have one million cholera vaccines to distribute how many should i distribute to each health facility there's no single query no single visualization that gives you that answer so instead we have to investigate now when i say investigation an investigation is just a sequence of questions and answers that leads to a conclusion so taking that example of a cholera epidemic and distributing cholera vaccines we might first have to query for all health facilities per neighborhood plot them on a map then get the populations of each district get the number of people with cholera per neighborhood over time and then calculate incidence rates and then finally determine which health facilities have the greatest vaccine needs this requires a combination of bar graphs maps and line graphs as well as the ability to overlay data and run on-the-fly calculations for incidence rates a strong data analytics platform will allow you to do all that it gives you the ability to ask multiple questions where the answer of one flows into the other keep in mind that investigations don't require visualizations they certainly help but anything that fits the category of questions and answers counts as investigating so for example think of case management products or think of contact tracing solutions they still allow investigations to be carried out but they don't necessarily have maps and bar graphs they're usually tables and lists but the strength of those products still depend on how seamlessly it allows a user to ask several questions and synthesize them together so their strength is still about how well they let you investigate so up until this point we've covered only the generation of knowledge the last two points i'll cover this one and the next is about how to manage that knowledge so this is about moving towards that sense of permanence that i talked about raw data may change over time but knowledge should persist it should still exist within your peers and within your organization even as data changes ideally a platform should let you a user recall the sequence of steps that they took to reach an answer these sequence of steps should be able to be stored and retrieved and when necessary collaborated on the reason we have for this is that in an investigation the process that you took the sequence of steps is as important as the final output people need to understand the logical steps that you took to go from point a to point b if knowledge is about connecting the dots between pieces of information then remembering how those dots were connected and giving people a way to contribute is a crucial step to solidifying knowledge within peers and within organizations the most common form that we see investigations stored as is dashboards where different parts of a dashboard represent different questions being asked and answered dashboards aren't the only way to store investigations but they're certainly the most common so that's what we'll use as our example imagine you're investigating covid cases nationally to find high-risk areas one section of a dashboard might show aggregated covet counts for the country then another section might break this down by province and district and then a different section might highlight on a map the most at risk areas a strong analytics platform should treat dashboards as first-class documents think of how when you might have a google doc you can store that you can retrieve it you can collaborate on it dashboards should allow the same thing it's a document that you can store retrieve and collaborate now we're at the pinnacle we're at that holy grail of data analytics platforms this is about sharing and dissemination so you already investigated at this point you already stored it and but now you want to share your results a robust data analytics platform should have capabilities to allow the dissemination of your results there's lots of ways of doing this we could have a shareable url to a dashboard that's the most common way for any web application but ideally or we can allow sharing directly to key stakeholders as emails more robust platforms include report building capabilities so they allow you to even further customize the final output of your investigation so that it's presentable and consumable as a report fit for even wider distribution through formal channels all too often data analytics platforms stop short of this step they don't get to the sharing and dissemination part and they also don't include these built-in capabilities for the distribution of results knowledge requires a form of permanence a shared existence across people and organizations and the distribution of knowledge is what unlocks that power so we've covered how we went up that pyramid we now have a knowledge generating and managing data analytics platform but how do we take that final leap to decision making making evidence-based decisions requires having that access to knowledge not just information but having access to that knowledge does not guarantee that good decisions will be made so how do we bridge that final gap more importantly whose responsibility is it to make sure that analysis is extracted and leads to decisions and action is it the software developer's job or is it the user's job surprisingly it's actually both software developers need to design software that guides users to answer their questions solve their problems and make smart decisions it's about you having that flow to guide users through that users on the other hand need to know what they're looking for and they need to realize when they found the answers that they need and then they have the additional responsibility of communicating that to their organization and to their peers what this means for both software developers and users is that decision making must always be at the heart of the entire design and analysis process so just for clarity as at this point when i say software developers i'm broadly referring to all the people involved in the construction of an analytical product so this includes and isn't limited to software engineers designers project and product managers and data managers so let's start with the case where you're part of the software development team you're a software engineer and you're building a new analytical product the most common question a software development team asks and rightfully is what problem are we trying to solve this is a great question to ask but in the context of a tool that whose ultimate goal is to drive decision making it's not specific enough what we should ask instead is what question is our user trying to answer this might sound is just a simple reformulation but the added specificity drastically changes how you might design a problem so let's take covet contact tracing as an example if we asked what problem are we trying to solve it's very easy to just get lost in the enormity of the problem we might say we're trying to stop a pandemic we're trying to find all of a contact's most recent secondary contacts we might say that we're trying to determine how long a contact needs to quarantine these are all perfectly valid answers to the question of what problem are we trying to solve in context of a contact tracing product but number one the first answer is far too broad it doesn't get us any closer to designing a software solution number two is a more constrained problem and the solution might involve a table view with a list of all secondary contacts filterable by different criteria number three is even more constrained and the answer is just a single numerical value just the number of days needed for quarantine but what if we asked our users instead what the exact question they ask on the job is now we've added specificity we're asking about their actual workflow we're asking about the decisions that they'll need to make to do their job we might hear something like given someone's name i need to know exactly who i need to call now we've broken down the problem of contact tracing to a very clear discrete next action decision they're just saying get me a list of phone numbers to call so suddenly our software design is a lot clearer all we need is an input box where a name or an id can be typed and the output is a list of names and phone numbers we can then expose more information if we wanted through more advanced controls such as allowing the user to mark off who has been contacted letting them add notes share pages with other users etc but the core solution is just a search box and a list of results because these are the immediate things that the user needed to make a decision nothing else behind the scenes our solution is still a whole data analytics platform we still treated we query databases we display results we allow the users to store notes track their progress share with others and collaborate but we still optimize the entire design of this software solution around one thing only what decision needs to be made now if you're a data manager and not a software developer or designer you might be more focused instead on the data collection practices your job is still the same the principle hasn't changed structure your what decisions need to be made for example let's look at a cholera epidemic the problem that we're how but probably speaking there's two ways we can tackle this vaccine distribution or wash interventions we have to ask the users the public health analysts on the job what are the exact questions they're asking and there's two perfectly valid questions they might ask what health facilities should i give how many vaccines or what neighborhoods should i target for wash interventions to chlorinate water sources these two questions pertain to the same problem but the data needs could not be more different for the first case we need to aggregate cholera rates per district get the locations of all health facilities in order to plan a vaccine distribution strategy for the second question targeting wash interventions our data needs are a lot more granular we'd probably need the addresses of cholera patients again get locations but now of water sources and then triangulate potentially contaminated water sources based on a patient's address our context hasn't changed we're still fighting a cholera epidemic the problem is the same but our data is now designed entirely around what decision needs to be made if we had collected aggregate cholera data from the start but later found out that the real decisions to be made were about water chlorination we would have been totally unprepared to make those decisions because our data collection never went to a level granular enough to look at addresses and water source locations now let's say that you're neither the data manager or software developer you're the user of these tools you're maybe a program manager or an analyst ideally the software has already been designed with your needs in mind but even if it has you still have a responsibility to use these tools as effectively as possible your next steps are still the same as those of the software developers the principle hasn't changed you should ask yourself what are the questions that you need to ask your platform what decisions do you need to make all too often as users we jump into an analytical platform but don't know what we're looking for so we get lost in all of the features that are available we need to first understand the decisions we need to make with the data and from there we can determine what questions we need to ask once you've determined the questions to ask you can structure this as a query that you can run on your data analytics platform and then you can take all the steps that i outlined previously to go up from data to knowledge you run your query you explore your data and visualizations you investigate different pieces of information until you find the connections that really answer your questions you store these investigations and then you share your results now as an analyst your unique role doesn't end there you still have to turn this process into an organizational behavior utilizing data needs to become a routine in your workplace bring other analysts into the room while you use your software convene multiple stakeholders understand the decisions that people need to make and from there you can all break those down into decision into the exact questions that you need to ask the data your goal here is to create a circular feedback loop between data and decisions your data drives the decisions you need to make but the decisions that you need to make should drive how data gets used in the first place keep in mind that this process doesn't end once a decision has been made as with all interventions monitoring and evaluation must remain a part of your processes the decisions that you made need to be tracked monitored and evaluated at the right intervals how do you determine if your decisions were good how do you determine if an intervention was successful how do you determine what changes you might need to make to improve the intervention your monitoring and evaluation processes need to go through the same decision-focused design process that we just talked through otherwise the data that you collect at the monitoring stage won't allow you to make meaningful evaluations that can improve your programs if the data collection wasn't planned for the questions that you're going to have later so if there's one thing to take out of this talk it's really that data-driven decision-making is hard and it's even harder when you don't have the right tools for the job what i've attempted in this talk is to hopefully give you a framework to evaluate if you have the right strategies and tools for the job or if the tools that you hope to build are the right ones what i've done is broken down data analytics software into two stages first how do we turn data into knowledge how do we go up that pyramid second how do we design software and strategies from the ground up so knowledge can power decisions how do we take that lastly from knowledge to action data analysis software needs to go beyond just data visualizations it needs to offer a comprehensive data analytics capabilities at its core data analysis software should be all about turning data into knowledge because without it we can't make good data-driven decisions a strong data analytics platform should cover all the stages necessary to turn data into permanent distributed knowledge across an organization and ultimately we need to stop seeing the relationship between data and decisions as unidirectional we need to stop thinking that data only powers decisions data analytics platforms come in all shapes and sizes but to truly power data-driven decisions the decisions to be made must be a part of the platform's design from the start this means that the data must be structured in a way that supports the key questions to be asked and the software needs to be designed in a way that facilitates a user's ability to make decisions as users we can take this same decision-focused approach to how we use our software the questions we ask our data should be always guided by what decisions we need to make this way we can encourage stronger data use cultures at an organizational level to better monitor and evaluate the interventions we make ultimately decisions are not the final output they're the heart of the entire process so from the inception of a project the decisions that need to be made should guide every stage of a data analysis solution and strategy every single step of the way without this focus there really is no amount of software that can ever solve our problems thank you everyone and happy to answer your questions all right great job juan pablo that was pretty interesting um doesn't seem that we have too many questions that that folks will will write in uh now but to kick off the questions um i have one right at the start i was thinking um about what he had said do you have any advice for organizations that have been thinking about using an analytics platform like yours but they're not sure if they need it and what what do what would you recommend to an organization uh to help make them jump to real data-driven decisions rather than you know more traditional tools yeah absolutely so yeah thanks for that so um really at the end of the day it goes by back to what i said about like if you're the user of these platforms there should be decisions and questions that you have in mind so when using an analytics platform like the one we have it's to consider what are those decisions investigate what are the product offerings that we have there's a lot of different tools available do any of those fit the bill of what you're trying to do if you're trying to decide about for example program evaluation and you need to make decisions there then potentially dashboards and our query tools might be the best ones for the job if you are trying to for example do something on contact tracing or epidemic response maybe something closer to case management offerings or something like that which are products that we also offer i would be closer so when deciding how to use this because data analytics platforms are so vast first thinking about your use case what is the decisions to be made what are you actually planning the interventions that you're planning and then that will tell you which are the exact tools for the job and then from there you can evaluate okay does this platform whether it be genesis or any other do they offer those tools that i might need whether it be querying dashboards case management data like data science machine learning or anything like that and so yeah it always comes down to the decisions and the use cases that was muted classic okay i've got another um and that was it was towards the end of what you're saying and i've i've always found that the biggest challenge is not in technology or software necessarily it can be but if you hit all those marks in and develop the perfect software for example you still have the challenge of getting people to use it so how does census encourage users to ask the questions and interact with the software that then allows them to get to that decision-making yeah whether they'll be training or or or whatever yeah absolutely i couldn't agree more like as a my case as a software engineer that is you know very early on in my career the first day i learned it's like oh i've worked so hard on building the software but you know the results don't coming out there's still no no decisions uh not much is changing and yeah at the end the the human aspect of the problem will always be the most important and so yeah genesis there's a lot of effort that we put into the the training portion of things like we have project managers all around the world and user engagement specialists all around the world that work directly with our users training them on the platform but not just training on the nitty gritty of like this is what this feature does and this is what that feature does but also when discussing with our partners evaluating what are their in general what are their analytics capabilities like do they have health analysts in the first place and if not then we also focus on that process of capacity building as far as analytics goes on having workshops on training around general data analysis principles um so it goes it goes on both of those sides on the like more foundational level of data analysis and training people on that level and then also on investing the time and training people on how to actually use the platform and get the most out of it so that they can actually make the decisions and then after they make those decisions how do you track them and how do you share them with the world excellent yeah that sounds awesome uh so we've got we've got another question from virginia at zen genesis she says give us a real world example of when you had to redirect or reverse course to optimize for a new decision she's thinking of your work after the cyclones in mozambique yeah thanks virginia um so yeah so this was an example i gave earlier in the talk on like a cholera epidemic and this is something we we experienced in mozambique um and i i was there in person having to build the software and so that's where the example came from um immediately with an epidemic when there's you know hundreds of cases of cholera's the immediate data need is um just understanding a situation how many people have have cholera per neighborhood and organizing the vaccine drives and so that's what our entire data integration solution was based around and that's everything we built and it was it was it was successful we were super proud of that but then once you get down from hundreds of cases down to only like five cases of cholera you know the the baseline for cholera should still be zero so we still have to figure out how to do those last steps um but then at that point the focus starts to shift on much more targeted um interventions you know having to actually track down the patients having to plan out wash interventions and to chlorinate water sources and that's where we realized our entire initial efforts as software engineers had been around this more aggregate level data we had a lot of visualizations a lot of tools to power that um and two weeks before the the watch interventions we're going to start with the planned wash interventions we're going to start we realized that we hadn't planned effectively around that so very quickly luckily we're like a very nimble and small group of engineers that we could just very quickly build a whole new tool around case management and so we could build that tool for a cholera epidemic where they could now individually see each health facility they could call the doctors gather information about patients and then dispatch rapid investigation teams and plan those chlorinate those coronation efforts but it was a huge undertaking to do that which was a result of from the very start not having asked those key questions of like what are the future questions to ask will this become a much more granular operation and so that was a huge moment for us to learn that like those questions planning for the existing questions of vaccinations but also the future ones are just as important because you'll have to plan your software around that simple very good all right i don't see any more questions in the queue now i'm sure that folks will watch the recording and probably reach out to you directly for those online or for those that will watch the recording afterwards you can always check out juan pablo's bio down below sorry my screen is messed up here okay i just want to check the agenda here and it looks like there's a 15-minute break before the fireside chat um so unless there's any more questions i'll go ahead and conclude this session but feel free like i said to reach out to juan pablo directly for those on the call now and for those that will watch the recording later if you want you can explore other parts of the the hop-in app uh you can go to networking and meet with others but for now i think that's all we've got thank you juan pablo you
2021-08-30 22:51