all right so hello guys um welcome thanks for joining our session um Third Day on reinvent I still hope you're doing good yeah all right nice um so my name is um Mike Wilman I'm a senior data scientist at XL2 you've probably not heard of Xcel 2 until today um it's a joint venture of Cap Gemini and Audi um I'm joined here by an amazing team uh together uh We've brought this uh project to life with a you strong focus on Innovation and collaboration um please guys go ahead and introduce yourselves yeah hi my name is Thomas I it architect at XL2 hi everyone or as we say in Germany Bavaria SOS suam my name is Edward I work at Audi as the Project Lead and the enabler for AI Solutions in the planning department of Audi and hi everyone also from my side I'm Simon I'm an external PhD student at Audi and I'm researching about the use of generative AI for the factory planning domain so thanks guys uh we are really excited to be here at reinvent and talk about how we have used gen to streamline ai's tender process before we dive into the details I would like to give a quick outline on what to expect in this session we will start with a very basic introduction on llms what they are and how they work um after that Audi will share their Vision on how to use llms in their attender process we will then highlight the customer Centric development approach we chose for making sure that the product we developed um met real world needs um after that we will give a quick overview on the high level solution design and right after that we will dive into the architectural deep Dives for all of you guys who who like the the tech stuff next we will share uh what currently is in development and also what the future holds for our product then it's get getting uh pretty exciting where will we will share um where we stand today through a system demo and um finally we will wrap things up by giving you some key takeaways to take to your own projects all right uh let's start with a introduction to llms as I said it's a very basic introduction so we are doing quite a lot of simplifications here and there um but I think nevertheless it is um quite useful to get a better understanding um on the topic of llms so let's start on the left hand side with some key facts about llms so llm stands for large language model and large can be taken pretty seriously so um GPT 4 I think most of you have used it through chat GPT um is estimated to have about 1.8 trillion parameters so that's huge and explains why they are so expensive to train and even um expensive to run they are a language model that means they're capable of of understanding and generating very humanlike text an llm is also most of the time a so-called GPT and GPT stands for generative pre-trained Transformer generative again just means that the model is capable of generating new content like drafting some text for you answering in a conversation um I think you all have experienced that um by now they're pre-trained on huge amounts of data so basically you can say there's trained on internet scale amount of data and by training on so much data really some interesting capabilities emerge in those models which you are probably also um aware of uh because you've used such such models Transformer refers to the Deep learning architecture used for those models that architecture was introduced back in 2018 by some very smart Smart Guys from Google through the paper attention is all you need and that was really a Tipping Point in the um NP area I think what makes the Transformer architecture so successful is that it allows you to do this training in a very parallel fashion and that allows training on such huge amounts of data in the first place so now let's uh switch to the right hand side and look um how such a model Works how it is generating text so basically what an llm does is it gets a set of ordered input tokens and generates a probability distribution of all possible output tokens so that might still sound complicated but if we now make the assumption that one token is an English word it basically boils down to you give it a piece of text and it predicts the most likely next word but how is it doing that so um first the model needs a vocabulary and the vocab basically is the set of tokens n to the model so in our example basically that means all English words in existence each of those tokens or in our case words has a learned Vector representation in very high dimensional space which is called an embedding and those embeddings are aligned in that high dimensional space so that they encode some kind of semantical meaning of those tokens and what that then happens during inference is that those Vector representations get adapted from the model um by by two important things so first the vector representation gets adapted by position information and secondly that's probably much more important um by context information by attending to every other token in the input and adapting the representation basically means it moves in this High dimensional space and by doing that the last token in the input set encodes all information the model needs to predict what the next most likely token is so in our example in Winter the skies often the model says with 80% certainty uh gray yeah to be honest when I made that example I was in Germany so now being here in Las Vegas I probably should have written in German winter yeah but I think I think still you you get the point there um what is also important to say here is um you you can slightly adapt those models uh by using the information encoded in those embeddings to do other other things for example you could use the um maybe average embedding over all tokens um in the input to encode the semantic meaning of a whole sentence right um and by doing so you could do some kind of similarity search along texts um I mentioned that because um later on that will be important so that you've heard that before yeah so that's basically um a very simple introduction to llms I hope um it helped a little bit in understanding things great thanks Mike well let's get a little bit into the subject matter here my department at Audi is responsible for the plan planning and construction the next generation of our engine production and for this production of this very complex products we need special Machinery so at a very early stage in the product development process uh we need to write down all the requirements of those special machines to procure those machines and in at Audi we call this specification process the tender process basically the tender process can be divided into two core steps first one being um description of all the requirements of the machine and the second one is the matching of possible uh potential offers to our tender and making a final statement so the planner describes all his requirements in this tender document which can be quite complex those requirements come from various sources like the product development the research team or the technology development wants to implement their newest manufacturing technology in the next generation of assembly lines so it's not surprising that those tener documents can have more than 400 Pages fully stacked with technical informations even for our best experts it's kind of hard to uh keep track of every information within this uh these um 10 documents and just to give you an overview we're talking only about the planning department of Audi here we have far more than 1,000 tenders per year and the problem becomes even bigger if the potential suppliers respond to our tenders here we get multiple offers per tender we're talking here about five offers up to the number of 50 offers per tender and trust me in that matching those offers to our tender is very labor intensive to prove that let's do a little calculation here for Simplicity let's say we have 1,000 tenders per year though we know it's usually much more than that and per tender we get an average 20 offers so that sums up to 20,000 offers per year at Audi planning department and just to quote a colleague of mine we need one week of Labor to do a final statement per offer that sums up to 20,000 weeks engineering well paid engineering weeks per year so we have round about 800,000 engineering hours for reading replying and comparing two documents to each other there's a huge potential here and we are not even taking into account that our research team or the development development guys or even the whole Volkswagen group not only Audi uses a very similar tendering process we can do that better what if we can use llm that can understand and generate humanlike language to help us to compare this tender to an offer and after one year of in this project we can say yes AI can help us with that and we will show you how we reach that goal okay so let me start with a question um I guess you all heard about uh chat with your data as an approach and I'm asking in which organization you don't have any chat with your data like initiative going on at the moment yeah there are no heads which I also expected um because even though this was the spark that started this whole project um we were very aware when we started it that um yeah two things are going to happen first this technology is very generic and it will very quickly become a commodity because it's so easy um adaptable to all kinds of domains and this actually happened by the end of 2023 a lot of vendors had it already incorporated into their offer and it was not really something to stand out and the second thing is that we realized very quickly the planners they don't need something to um just chat with the documents to just ask questions to the documents so um yeah we had to go on and find something of real value for the planners and the real value of this technology that we found out was that we can compare pieces of information together we can compare something a requirement from a tender with all the chunks of data that come in a single offer and we need to build something based on that and uh make something that brings much more value to the planners day-to-day work so we started the research um and we talked to some planners and we quickly found out there's no standard way of doing things everybody developed their own way of like some people just put comments to um AV vendor PDF directly or they some people really print out a whole offer document put it to a wall go with a text marker over it to piece together the relevant information it was a real analog process and everybody developed their own solution to this process to the problem to that they have to deal with such West amount of information on both sides the tender side and the offer side and our idea was to make this information manageable in some way and we need to build a system which does that reducing the overhead of dealing with so much information so we started some research and um what we found out what a lot of planners did was um at the very beginning they start to list the requirements that come with a tender in an ex sheet they just put like 100 to 200 rows of require requirements and then they go sale by sell for each offer fill out the document um to get a like summary of um all the offers that come in and this was a starting point for us okay here is something that we can maybe provide in a much more automated way for them and um this is also where we we put lever for our system so okay we we already gained this information very early so we started to talk with the the planers more and one funny anecdote was um at the very beginning the first planner that we talked with he started um the whole conversation with the sentence so you are the people who are building the system that will replace us and we all laughed a little bit about it but um yeah we knew um at the very core there's a real problem we had to deal with that is the fear of the people that we built the system for um that it will take their job at some point and it's a very real fear so we had to mitigate that fear in a way um that they would understand okay no we're building a tool for you you're not something that will uh like um replace your expertise because this expertise it comes with years uh and years of dealing with vendors and getting a lot of experience how you need to deal with vendors and how to deal with technical details of whole process what we could do was reducing the work that nobody liked to do matching information together and um the easiest way how we could achieve this was to build a puc from the very beginning a very low Tech PC which had no server side components at all it was just one tender one offer and what it did was you could click on one side and would just scroll and highlight the relevant information in the other document you could just click in on one side and in one document and scroll on the other side to the right place and this was something that the uh the planners they could really grasp from the very beginning they could see the value in it because they could see themselves using that tool and this was like a good Tipping Point for us because when we showed it to them they were really on fire for the tool and they really wanted to help us and wanted to be in created in the whole process of like building this tool step by step and this is what we did we did several iterations with the planners to get to a point where it really matches their their process and their their um it advances the user experience yeah so until now you've got a good overview about the youth case and our project setup now I like to start and talk a little bit about how we build our solution uh for the tender process so how did we BR bring AI into the tender process at our so as you might already got this is a really time consuming and also complex process so we needed to break down into smaller steps which are more manageable for us and the first step here is to make our documents llm ready so like in almost every uh AI related project uh we need to prepare our data first so what does it mean data we have several formats of documents so there are Word files PDF files even PowerPoints or Excel are used for the tenders or the uh offers and we have also different multi modalities into these documents so there were inform information inside of text we had tables and also figures and drawings which were very important for us that we can get the information out of these elements in order to have a good evaluation at the end furthermore we also needed to understand the structure of the document so my colleague said these documents are really long often more than 400 Pages we need to break down these into more smaller and more manageable blocks which we do by extracting the structure of the document when this is done we go to the Second Step here we have at first a look only at the tender file and we extract relevant requ requirements out of these files here we create a checklist which we can use later on for the evaluation and here it's important for us that we now where from the document we extracted the points in the checklist that we can later on Match these requirements another point which is important at this step is that we include our domain expert so as we said before we're not going to build a tool that replaces the planners but to support them and here we need uh input from them so that they can also Fe their own experiences and preferences into the tool so that they will benefit most in the end from our evaluation when this is done we come to the third step and finally the offer comes into place in the third step we want to match the requirements we extracted from our uh tender file with relevant part in the office so we had a tried to force our suppliers to to answer our Tenders in a specific structure that this step is done pretty fast but uh most of them don't uh follow our instructions and I guess for those of you who are also dealing with penders might have similar experiences so what we need to do here is that we look through the whole offer where we find relevant Parts which meet our requirements even when they are distributed over the document and another point the challenge here is our domain specific language so we have most often German documents which contains a lot of domain specific keywords most of these things are not uh the best when it comes to AI models in the last step finally evaluation takes place here we check how well uh the offer meets our requirements and the challenge here is that we don't have a binary classification by results so we cannot say this is good and this is bad so we have some fussy results categories and also some uh results which are in between the approval or not approval so they are things like we need further clarification uh even internally at our company or in discussion with our supplier so for these classifications we also need to include the expertise for our domain expert all right so oops thank you zimon for uh sharing the generous solution approach um now will follow uh a deep dive into the architecture uh basically um you will recognize a lot of things just mentioned but with a little bit more technical depth um before doing so I would like to quickly highlight our journey from the pre- Bedrock area into the Bedrock area because when we started that project um over a year ago Bedrock wasn't generally available so uh we started by hosting open source models on Sage maker using um text generation inference container from hugging face and please don't get me wrong here I I love open source I think it's super important um I'm also a contributor myself uh but we we really faced some issues here so um at that time open source models often lacked good documentation so setting them up correctly was kind of tricky because you often didn't really know okay what's the correct prompt template to use what is the correct setting for the text generation inference container to apply so it really was not hasslefree also managing the scalability of the end points really was not trivial because we were in a PC phase so we faced a lot of big on demand quar and then basically scaling down to zero I mean it was doable but it was yeah like I said a little bit of a hassle when Bedrock became available it basically gave us a very elegant solu solution to all of those problems um so first it gave us easy access not only to open- Source models but also to closed Source models like entropic um it also give us gives us and I think that is really one of the greatest features of of Bedrock a uni unified Converse API um so you can really easy and quickly change to different models within your application and honestly Bedrock just just works right it's super easy to use it um scales at least to our demands pretty pretty well uh and this it's reliable so switching to bedrock really helped us to shift our Focus from from managing infrastructure to solving business problems but now finally um let's have a deeper look into the architecture starting with the document pre-processing pipeline so like zimon mentioned um the the assets we get uh for the tenders and offers they have multiple different modalities from PDF documents to Powerpoints and excels so the first step and all steps as you can see are um succeeding AWS batch jobs orchestrated by a step function the first step is really about extracting the text out of those different document assets and we chose on using an OCR service for doing that um AWS textract and using an OCR service for that really gave us some key advantages the first one being that you basically can read text out of every asset that is convertible into an image and that is basically everything because it's OCR so um by choosing OCR we didn't really have a problem with all those different document modalities the second key Advantage is that the AWS text track service does not only provide you the extracted raw text it also provides you the document layout information and that is a huge enabler for different things first it allows us to really increase the data quality from the text we extract to give an simple example um maybe in the original document there is a table that crosses multiple page borders we can just merged that imp post so they have we have one beautiful table in raw text and there are more things but I think that's a good good example um the second advantage of having the layout information is it's a really a cool enabler for for some things in the front end which we will later see like for instance Auto scrolling because we have the position information from the text we extract the third key advantage and that is probably the most important one is we can use the layout in information to chunk our document in a way that the chunks are as coherent as possible the llm cannot really work on the whole document so we have to provide it little chunks and having coherent chunks really makes the whole process much better and we basically use title and section header informations to create those chunks and that that really really really increased the quality of our product so having extracted the raw text the second step is about um extracting information encoded in images like zon mentioned often times um there is important information in images um again that step is basically enabled by having the layout information because we know where in the documents there is an image so we can extract the image then feed the image with the surrounding context to a multimodel llm in that case you see we use use clot 3.5 zet and generate descriptions for those images so we also include the image information in raw text the third step then is about chunking and embedding so chunking um pretty straightforward I just explained uh we can use the layout information to create the chunks out of the document and then we create embeddings out of those text which we will later on use during retrieve for semantic similarity search so basically when we then start from a requirement we use semantic similarity to retrieve matching Parts out of the offer documents so R basically um for the embedding model we chose um Titan V2 from from Amazon I think it's a pretty good model by balancing performance and cost it is also capable of handling multiple different languages um you mention for us German is pretty important that handles German pretty well um and it is also capable of handling quite large contexts because our chunks still tend to be quite long so the last step then is um an an optional one I mean it's not done for for offers but for tenders basically we now iterate over the chunks we extracted we give those chunks to the llm and ask the llm to pre-structure um by giving giving us or extracting for us a list of requirements by giving us a title and a good description on what the requirement is about so by doing so we also have a unified base for comparing to different offers which is also important and um yeah I mean we have to persist the data we generate and of course uh we we go for a postgress on Aurora um since we have some Vector data um Aurora can also handle that um just use the PG Vector extension so there was no need for us to go for a specialized Vector database so uh we have covered document pre-processing um now let's switch to how online inference works and how it incorporates the the subject matter expert to ensure that we have a reliable and robust process so the first step is um I think it got mentioned multiple times is about involving the and letting him cross check the extracted requirements out of the document so basically we give the the option to either add a requirement that was missed by the llm or deleted requirement that is not really relevant or maybe slightly adapt the extracted requirements if the is then happy with the extracted list of requirements um he can start the inference process by just a click or two in the front end and what then happens is um yeah basically what I mentioned before we do retrieval by semantic similarity search for a given requirement we search in the different offer documents for matching chunks um for that given requirement when we have retrieved the information we give the requirement and the retrieved information from the offer documents again to the llm and we ask the llm to answer if that requirement is met by a certain offer and if it is met we also ask the model to give us citations in its reasoning this is pretty important since it helps the later on to quickly check if he follows the reasoning or not so um that's basically exactly what the last step is about it's again about involving the and letting him check the results of the and maybe adapt them if if necessary so um this whole online inference process is enabled by some AWS key Services uh we use a AWS um EPSN for yeah for the graph ql layer connecting the client to the back end uh we use lampda for managing the um inference logic again Bedrock for talking to the llm and uh for using the embedding model and of course um again again we used the Aurora database for managing the data and managing the the search so by combining automated inference withme oversight we we really found a good balance of using AI powered speed but ensuring that we incorporate expertise which is key for for ai's tender process okay so before we go to the demo of the front end um I want to tell a little bit about the current developments we are facing um so what we built right now is all built with the vanilla models like we use vanilla Titan embeddings we lose uh vanilla sunet um llm and we thought um would it make sense to fine tune something would it make sense to uh fine tune an example llm to our domain expertise um and in these thoughts we quickly came to the conclusion that um the biggest um uh Improvement that we could see is probably in the retrieval process which means we could generate better embeddings that are better fitted to our domain because um I will describe this real quick when you have a look at the right um at this uh schematic um graphic it shows um for a single query when you have one requirement and you go over all the text chunks from a single offer um it's actually hard for the model to really distinguish between relevant and irrelevant part this is due to the fact that these vanilla models they are trained on like World Knowledge they are not specialized on any domain or anything so a huge part of this whole space that an embedding um can fill is taken for all things totally irrelevant to our process so um the idea is here to get a better separation because here when you look at this graphic it can mean that um for retrieval um some totally irrelevant Parts get a better similarity score than um parts that might be key parts that you that you need um to to do a real inference on it so um the idea is here to tr the model on um really domain expertise and how do you get that domain expertise um you need to ask the subject matter experts which are the planners in our case and we need them to manually match requirements to the parts of the offers that they see as most relevant and we have when we have built this like golden sample as we call it we can then go on and build a synthetic data set out of that and that syn sythetic SATA set then has the the quantity that is needed uh to do an actual um find uning step on um some open model and we are currently in the process of building this golden sample to go through that process um and actually fine-tuning a model itself it's pretty cheap compared to how much it cost to to uh train um uh like a foundation model you can go into millions or billions of for training a foundation model but fine-tuning a model can be done for uh yes some hundreds of dollars um and the real challenge here is to build a data set that's really repes representative for you and um yeah we are in that process right now even though for us it works good with what we have but here we can really make a jump forward in our opinion so yeah now I will hand over to zon questions uh we will of course in the end so so no worries yeah I think now is the moment you all wa wait for now we'll have a look into our demo and see how we included all these functionalities into our front end so here you can see our web application basically there are three columns on the left side there is our uh our structure we've uh extracted from the tender document so here the user can navigate through the document and see in which part of the document is currently working on the right hand side we have the possibility to see our tender file so this is the first step where we create this checklist on the right hand side we go through the tender we have and see how uh in the document The Things are Written here in the middle of the screen actually the action happens so here we can see the checklist which was created from our llm and here the domain expert also can include his domain knowledge so it's possible to remove points in the checklist to add things uh to the checkpoints or even add completely new checkpoints and all of these things were directly used for the evaluation later on so as you can see here on the screen they include some new checkpoint and we can click on this plus button uh which edits it directly to our checklist so once this is finished and the domain expert thinks okay now let's evaluate this chapter he can click on the button below where he starts the comparison process it's important to mention here that when we start this process we can do this chapter by chapter so we don't have to wait until every colleague has finished the evaluation of his specific chapters so now this takes a little bit let's jump to another chapter which was already evaluated and here you can see now in the middle that get generated results from our from our pipeline at this moment I quickly want to chop H stop our demonstration to see why we profit from using similarity search so the first point in this checklist here is something about 3D layout planning which was a requirement we said when we look into the offer which is now displayed on the right side we see that in this chapter here it's mentioned that a software called solid work which is actually used for 3D layout planning will be used uh to do this but in this step of uh or in this chapter of our offer it's not mentioned anything about 3D layout planning but with using the semantic search our system is able to to capture this and use this specific part of the offer for the evaluation of our checkpoint about 3D layout planning so let's continue with our demo from the layout you can again see that we have three blocks on the left side there is again the structure on the right side now as I mentioned there we can see the offer it's also possible to pop out several parts of this uh window and use it on a second screen was which was one of the requirements given by the domain expert here I already give gave a little introduction What's Happening Here in the middle so here you can see these points which I marked now so five means the llm thinks that uh this specific part from our checklist is met by the offer in an on a level out of five out of 10 so all of the checkpoints are evaluated by the llm and now our subject matter again can do his own evaluation provided with the basis from the llm so here he can go through the process checkpoint by checkpoint and for each checkpoint he can say if a point is approved not approved or needs further clarification it's also possible to jump in the offer to the relevant Parts which we matched in the step before so this also helps the the planner to speed up his process if he needs to to look something up in the offer from the supplier we also have the possibilities to switch your view and see how the original tender file was used or even here we see that we can see the whole checkpoint which we created before uh that if you want to have this information again on the screen I guess now we will come to the point where we can switch the view and here we can see now our tender file again so this is maybe the way the planner worked before he has two documents side by side and he can also have this possibility to use our tool and see what he has written in the tender file and what the supplier wrote in the offer so when all of this is done and every chapter is evaluated or even in the middle we can also download the information to have it in an for example Excel file to use it for further evaluation this is how we do it this is our tool for the AI gender process and this is already used by our planners and they like it okay thank you Simon for showing off Audi Stander toon I'd like to conclude with our key takeaways for you the first one being pretty obvious uh carefully planed cost we all know Jen and I is not cheap try to find a matching model and a matching performance to make business here the second one and probably the most important for us um we need to understand the in our case the technical planner because their needs and their process is what we have to translate in our gen and I solution and the last one even though of course the responsibility is still uh for the for our technical planner we created a balance between the human in and the AI interactions and therefore created a co-working atmosphere here between the AI and the human and benefit from the synergies here and now I'm giving the word to Mike for the last slide yeah uh I I think um yeah thank you guys for attending our session uh there's a lot of things uh we we have in our minds to to improve the project um so um if you have any questions or would like to discuss with us your ideas you have uh please feel free to to to come forward and and visit us um yeah and again thank you really much for attending this session [Applause]
2024-12-31 02:13