Navigating the Cancer Data Journey: A Fusion of MongoDB and Microsoft Technologies
uh good afternoon everyone um as you can see from this slide it took mobile phone 16 years internet 7 years and Facebook 4 and a half years but to reach 100 million users but chat GPD just took two months isn't that incredible and again I wanted to add some animations here so I could have asked you how many months it took but unfortunately the changes were done last minute and you know you got the answer ready made right so I think uh just want to share that again the important thing here is that this we have experienced a diffused explosion in chat GPT right so earlier the slope was a gentle curve but as you can see with chat GP it's almost like a vertical line right now so my name is D wagle I'm partner Cloud Solutions architect um uh with Microsoft and work very closely with mongod DB and Paul and I am going to show how Ai and jna come together with mongodb Atlas on the cancer uh use case but I'll ask Paul to introduce himself first hello yes my name is Paul L I'm a Senior Solutions architect MB on the partner team and I'm exclusively um involved in working with the Azure partnership Azure partnership and um and work very closely with devans and to help out and assist and enable on those uh migration Technologies awesome thanks Paul so let's jump into to um the presentation what we want to go over is that we want to show you the an underlying Azure Ai and mongodb services involved before we go to the demo so it's not been an uh uh overnight Journey right history of AI dates back to 1950 and as you can see that uh Microsoft has been involved for many many years in this AI Journey so we have uh heavily invested in AI with our partnership with open AI back in 29 and uh we understand that right now Genera generation AI is at a turning point where you know this is like every minute things are changing at a fast pace right and you have to keep track of these so it very it gets very very daunting so what we have done is like we have come up with uh um you know tools and services that help customers uh digest all this data quickly so let me go to the next slide now again this is a very busy slide but what I want to highlight here is that as your AI service that you see here is a service that is designed to make it easy for developers to deploy AI enabled applications without the expertise being in AI right they don't need to be an expert in AI so um one thing I would like to quick here which we are going to talk in our demo is our Azure openi service so what Azure openi service does is like it enables developers to deploy uh scale uh AI uh enabled applications with the Azure capabilities of scalability security and compliance um on the left hand side you can see mongodb Atlas uh which in this case is depicted as a transactional workload and on the right hand side you can see Microsoft Fabric and Azure open AI so for for those of you who don't know what Microsoft fabric is is Microsoft fabric is our unified analytics platform that gets your data VAR data State together and you can run analytics on top of that and generate insights in seconds it is based on a one L which is a data Lake as a service available to you and the key takeaway here is that you see your transactional systems on the left and the analytic systems on the right and they need to work synergically so that for the needs of the modern applications they can exchange data in near real time so let's double click on what Microsoft fabric is right so like I said it's a complete analytics platform it is based on one Lake which is uh open at every layer and again you get this built-in uh system of a data Lake available in your Microsoft tenant it is based on open format of Delta par and uh there is just one copy of it right so you have all this all analytical servic I'll show on the next slide um which you can utilize and the data sits in just one Lake and whether you are spark user your SQL users you can uh connect to that data and run queries on top of that uh fabric also integrates with AI co-pilot so you can accelerate your productivity and generate insights in seconds um again and fabric also has the same Enterprise grade security that we have with Microsoft aure so that you can uh get governance and security uh all in one place so again this might look little complicated slide so let me just break down uh break it down for you so like I said imagine Microsoft U Office 365 uh and fabric is very similar that Microsoft fabric is what Office 365 is for data so similar to you have Microsoft Word PowerPoint and Excel in Office 365 Microsoft fabric has this workloads of powerbi synapse data Factory and data activator which lets you run analytics on data sitting in one lake so you can have data from different data legs all converge and sitting in your one L and then whether you are a user of spark SQL analytic surveys you can all query that data in one go all this data is stored natively in one Lake in Delta Park format and it's open so that you know you can access this data uh with apis uh one uh thing to call out here is that it provides real separation of compute and storage and what I mean is that your compute layer is on top of that that can scale separately and you can manage your storage layer and your cost accordingly because they are decoupled they are truly decoupled in this stage so I'm going to ask Paul to walk us through the cancer demo that I know you guys are uh eagerly waiting for so I'll pass it on to Paul so this will be um this application we're going to do it in two parts and the reason why is we're going to do as Devon mentioned earlier an AI right so we're going to be leveraging Ai and then in the second part we're going to be Levering gen Ai and they're very different um there's a lot of conflation occurring between the two but they're very different and we purposely did that that separation so that we can clearly see the differences in where they can actually synergistically work together also so what I'll be doing is leafy hospital is going to be essentially patients that are uh awaiting screening so the cancer screening is for breast cancer so there's going to be a list of patients and what we're going to do is we're going to take a patient through a screening and leverage Ai and gen AI technology to be able to diagnose and and and work through that patient's uh you know data uh cancer data Journey so we're going to put in an image request for mamography right an x-ray and then that request will then be sent to an AI model that that we wrote on fabric we leverage Fabric's data science to be able to write a model that can an AI model that can then do a prediction a score prediction on it's called a birad score so it's a score on whether or not there could be some cause to be able to look deeper into um into that patient's perhaps a biopsy or followup and so then the information will also then be stored for future retrieval that image will be stored in a Lakehouse in fact Fabric and in fabric with a Lakehouse you can store table data you can store files store the format that you want and then we're going to take those analysis results of the images bring them back to the application and then that information that reporting information will be saved to mongodb and then we're also going to take some of that patient data we're going to do a very very simple report and quickly illustrate how mongodb data can be actually used for a report in fabric Cloud which which is which is powerbi cloud um so there's a p powerbi desktop that we're very familiar with but there is a cloud uh version that we'd like to introduce so here we have an application so there's our leafy Hospital application and you can see that is there's a list of patients and it's a screening list as we mentioned earlier and and they're in various States you know they have meetings that appointments that have been cancelled some of that have been fulfilled and this is a bampus so we have it filtered by bampus and in fact we're using Atlas search for this filtering and we can choose whatever radiologist and What patients they're going to be seeing and there's our filtering that we're doing so we're going to look at today so appointments for today and then specifically we're going to want to filter on appointments that have been booked but not fulfilled so we're just going to go ahead here and uh click the booked uh feature and now we filtered and now we're going to select a patient we're going to select Lu Lucia and we're going to walk her through a screening process and during that screening process you'll see us leverage AI technology to go ahead and analyze those individual images in as I mentioned an AI model in fabric using uh it's a private preview feature in fabric which is an endpoint that you can expose your AI model so I have my uh images here uh ready to go and I went ahead and clicked on my very first image to go request it so I'm going to take that image that image will be taken will be brought back into the system and uh it's a little faint on the screens there but you can see some various stages we're going to take that image we're going to retrieve some metadata from that image we're then as I mentioned upload it for some storage for future uses and then we're going to then score it and do that birad score so that's the phases that that image is going to go through in our system and so once that Returns what we'll also do is we'll go on on to the next image and then we'll go behind the scenes to um to um to see it go further sorry think I think we're stuck here so the video has stopped yeah can you hit it oh there it is okay sorry we'll have to catch up to the uh to the recording okay so our images have come in so you see that there's that BYOD score that has executed and then to the left uh you see some information that has come in so there was some of that metadata that we extracted that we collected and um and then uh we also have uh that BYOD score that we're going to use uh in the future here for uh The Next Step so what we're going to do is and there's that metadata so we'll go on and take the next image and the other images but we'll pop over to fabric specifically data science area of fabric and here you'll see um a notebook that we used to write and essentially program our AI model and train our a model so that it can it can produce that by Rod score and you'll see a variation of sample images that we used to train the model so that you know a by Ride 3 what does a by Ride 3 look like a by ride one and you can see the various syntax that we used and the process that we went through we won't to go into detail into this but um you'll see here that when we ran a test after we we ran a sample test and we had a bu Rod score so you see they're all negatives uh 0 uh negative 09 14 and 17 one is our highest so it's propensity to be a by ride five in this Test example so if we go back to the application see all our images now have completed and we have various birad scores so clinically what we would do the radiologist next steps would be during the process here that would probably want to um invoke a biopsy go ahead and do the biopsy route so what we can do here is select the biopsy option and here it is so now um essentially that biopsy image or data can be taken brought into the system and then um by clicking here and then now that will also be ran against a classification model a separate AI model that we wrote in fabric to then go ahead and extract this information so it ran against a model and here is our classification so we'll go ahead and select it here in uh Fabric and pull it up so it's got a different uh different type of um program that we wrote there a little different notebook we use about about nine different parameters to calculate that um that classification score and again just quickly scroll through some of the um some of the work that we did in order to produce that AI model and then and then if we pop over to how we published it so this is this private preview feature that's that's going to be introduced in Fabric and that we're leveraging right now and we have different versions of that AI model that we've exposed and we can choose whatever version we choose deploy and you can test different ones in your testing for example and then you can even do comparisons between the different versions and then uh you know our menu is at the top you can actually download your ml model you can apply the version and then you can manage your your endpoints and go see which ones are active which ones are not active so you can see here how it's really powerful you can build those AI models in fabric you can stand up an endpoint to that AI model and then you can call it from from programs uh to be able to leverage them so what have we done here right so we've collected some information we have some mogs we have a biopsy and then naturally the radiologist is going to want to capture their notes and put their notes into the system and this is what's going to eventually the second part of the demo take us to leveraging those notes and geni so we have clinical notes here that we can fill out we have conclusions have snowbed information which like the clinical terminology and but what we're going to do just for this exercise this is is what they wouldn't do for real life necessarily but we're just going to leverage gen to be able to publish some of the data that we want for this demo in order to be able to leverage it further on in our gen because what we're going to do using gen so natural language querying semantic quering natur uh similarity quering different different synonyms for that process we're going to go data mine this information and ask it questions um in a human-like fashion and be able to get those those uh questions answered and uh so this is information so a clinician could actually use gen in the future actually help generate some of that documentation but in this case we just generated it all so we can have some great sample data to leverage for our gen process so before I move on to gen I mentioned that I'm going to do a powerbi report wanted to do something really really quick um so I've got some patient information th that those patients come from different geographical locations so what I'll do is I'm going to look at that data and uh and actually that patient information what we can do is we can flip over to the U mongodb database and show you where it's stored so if I click on that patient table and there's our our collection there's our document and I've got a postal code there so what I'm going to do is I'm going to export I'm going to use not export I'm going to use that patient information from fabric and the way I'm going to be able to tap into that Mong be data is I'm going to use a data flow data flow Gen 2 so that's something that exists in fabric it's a connector into my mongodb database and so I go ahead and instantiate I want to correct the Gen create a Gen 2 data flow and I'm just going to get data from another source as opposed to the ones I had there I'm going to choose mongodb as my source so I click on that and there's my connector so I could fill in this information and that would give me my data flow connector into mongodb I already have one that I've created in the interest of time so would just go ahead and leverage that one so I'll go back into my workspace and uh close this out go into my workspace and go through and actually pull up the data flow that I created I call that data flow patience so there it is and that's just the connection information nothing else was done I'm going to go ahead and execute that data flow that I set up and you can see that it it spins up power query so if you guys are powerbi users and you've used P powerbi desktop you're very familiar with power query so now we're in fabric we have our power query this is where you would massage your data to to be able to have it appropriate for the report and in this case I pointed my report to that data and there you have the geographic information appearing on various areas of the map um very very simple uh powerbi report and uh but just really wanted to demonstrate how you could interoperate you know your analytics your UI based analytics um with mongodb data from fabric so what we'll do is we'll go ahead and move on and to the next part of our demo which is now we have this wonderful clinical information that we've uh that from from the U from the screening information that we've collected and we want to go ahead and and Implement what we call semantic search right natural language search and rag which is retrieval augmentation generation and what that is is the ability to then query your own data and be able to then use a large language model like we have down here so if we look at the flow we have our patient portal we're going to ask a question that question is going to be in free text right just a natural language question like what is their medication what medication are they on but we need to convert that into a vector which is what Mong to be understands or a vector search storage understands and knows how to do this similarity find things that are semantically similar to what I'm looking for in the database like medication based data um I'm going to search aong to be database it's going to come up with with results that are um you know based on on on what I'm looking for it'll bring up some of the better results u based on classification and then those results will then be taken and aggregated brought together into a prompt that I'll then use that data to then send to my large language model like a GPT right we're all used to gp4 we've done some queries at home we're going to generate a prompt send it to Azure open aai and and and Devon will get into more detail where you can you know where you've got a trusted system a closed AI system for patient type information is very important that has a lot of governance and Devon will get into some of those details um in further on in the presentation but we to go to that llm and then return these rich results so always better demonstrated so here we have our patient portal again our uh our provider portal and a radiologist and uh as we saw here we were in the mode that we had them for um booked but now we want to see fulfilled and when we see fulfilled we see Luchia that we had actually gone through the screening process for her so now we're interested in pulling up her results and being able to extract some some in some knowledge out of the results that we we uh we inputed here so we have the the not here we have the conclusions all that data that we want to be able to ask what we call just kind of natural language questions so um and and what we'll do here is to give you a behind the scenes look at that data we'll actually go to mongodb and show you so those are the clinical reports so you have a document stored in a collection and you see all the attributes to that document but specifically you'll see some of that free text there right you've got the the conclusions um you've got the notes over there and got those highlighted there and then you'll see a vector embedding right so that's where we've taken that free text and we've turned that into a vector to represent what a machine can understand in query and we're going to query against that Vector to be able to see the results that you'll see in a minute when I actually start putting together my queries and there's an example what a vector looks like it's a long long array of and you know I encourage you to go study uh it's it's a several hour presentation on its own just to be able to explain what vectors are and how they work uh they're very very powerful but we'll go back to our our UI here and then we'll go ask some questions this is where it all comes together this is where you're going to see all this technology coming together so the first question we'll ask we'll go ahead and paste the question here and you see what is the patient's current condition that is just natural language question as if I would ask a doctor and you know that asking those types of s questions in the system today is very difficult to get accurate response but Vector search Rag you know llm you know gen brings it together and you can see now that I have a nice concise uh response through that process that I showed you earlier going through that process going through those Technologies gives you that polish response that answers that question so for inst I can ask another question that says are they're taking any medications right and we can see that no they're not taking any medications so this is is where mongod comes in and you can add your own data or augment the data so let's say more data has come in in her let's say there's a chart and her chart has added information or more information has come in from a other data source and you now want that included in your chatbot um so here we have just I just created a little sample PDF and I just have a little little data about medication and a little a little um you know additional information about what that medication's about and what we're going to do is we're going to show you how to bring that information into your MB data store and then have it available for this chat bot now to now leverage for further questions so we have this little feature built in of course you wouldn't do that in real life you wouldn't allow this portal to just bring in any document this would be done by administrators but what we're going to do is I have that PDF document um stored on my drive and I'm going to just paste in the path to that file and once I pasted in that path I've load up that file so now we're going to go look and see in the mongod to be database behind the scenes what has happened right so I'm going to go to my rag info table so I'm just going to go highlight that and you see and I kept it simple just one record normally you would have hundreds and thousands of Records there um but I have that one record and that was that text that I pulled out of that PDF and then I vectorize it right I turned it into an array of decimals that this machine can understand and to be able to do these similar comparing and then I can ask these natural language questions uh against this data now that I've now introduced into mongodb so what I'll do now is I will um go over to the um application and move back to that and we'll close this down go back to ask a question so now I'm going to ask a follow-up question and say are they on any medication now and we'll see what the chat bot says and yes there's that medication that we introduced via the PDF so it is now searching the patient information and now searching that that data that I brought in and augmenting that data so that we can increase and we can even build on those questions because now I can ask if there are any side effects of that medication and you see that really nice concise precise answer yes there are and some of the damages and whatnot so this in a nutshell was kind of an illustration kind of the art of the possible of what you can do when you have ai in working in you know synergistically in unison with Gen and the power of these two technologies coming together in this type of scenario so we return to to Devon and I mentioned earlier that they really the importance of you know the different use cases that you can use and then also the governance of that data that you're sending to the llm thank you fall for that wonderful demo as you can see Paul just demonstrated how AI open AI sorry as your open Ai and jna came together to deliver on a cancer use case now what we wanted to highlight on this slide is that hey this was just a use case in healthcare industry but just planting the seeds here that you can apply this to your own use cases right think about Hey What Can J do do for you yourself so uh the typical use cases that we have seen customers use jna is to improve productivity improve customer end experience and automate workflow processes right so again this is a geni journey that is very recent so we are here to help you on this journey if there are any questions that we can help with we'll happy to take them after the session so that we can work together on this journey uh this is not responsible AI is not something new for us uh we have been working on this for a long long time as you can see from the slide about 8 years ago SATA nadela uh wrote an article in slate magazine titled uh the partnership of the future right so this journey has been going on for a long time and Microsoft has been innovating in AI as well as responsible AI for many many years now so rather than going over the timeline in interest of time what I wanted to talk about is that hey this is a continuously evolving space we are adapting as as as the markets are adjusting to the privacy and security concerns that are brought by this uh llm models and new technology and how we can help our customers securely uh share and deliver this information so we just looked at the Microsoft uh responsible AI principles so they are based on six core principles right sorry um so we have four core principles which is fairness reliability and safety privacy and security and inclusiveness and they are underpinned by by two foundational principles which is transparency and accountability right so we believe that the development and deployment of AI should be designed by an ethical framework and this are what we um uh provide with our responsible AI uh model so again in interest of time we do not have to go into the details but I can share more information if you guys are interested in that this is a very important slide what this talks about that when you do use as your open AI your data is secured your data is your data we do not share that information with any customers we do not use that data to enhance our Azure open AI models so this data is protected and they run in Azure vet which is controlled by the customer so depending on the customer's uh classification and privilege on that data of course if it is public data then that can be used to enhance the model but as long as it's classified as private or secured that information will not be used to share uh or or enhance the AI models so last September we announced the customer commitment uh copyright commitment what this does is pro protects our commercial customers from defending against uh copyright claims of using AI co-pilot and uh lastly we also added as your open a service uh to that uh commitment so what it does is like provided you follow some technical measures to make sure that there's no infringement on the output you are covered under this commitment right and this will boost the confidence that customer can develop GNA models um with confidence so that is what we have I know we are running short of time so uh with that we come to the end of our presentation thank you for your time and hope this was useful to you if you have any questions you can reach out on the email uh mentioned on the slide and this is some additional documents so if you reach out to us we can share this deck with you as well so thank you for your time
2024-05-24 07:09