so welcome everybody um to our Agile BioFoundry webinar today I'm Nathan Hillson I'm the the lead principal investigator of the Agile BioFoundry I'd first just like to thank Stacey um Stacey Young within um DOE for helping get this Zoom meeting and the webinar set up I'd also like to thank on the Agile BioFoundry side Emily Nelson for helping us get the agenda coordinated make sure we all know what we're doing um I'm going to be giving the the agenda in the next slide so you know what's going to be happening over the next hour and then I will transition into providing an overview of the Agile BioFoundry I'd also like to thank DOE the Energy Efficiency renewable energy office at bioenergy Technologies office and on with us we're really lucky to have Gayle Bentley who is our technology manager so so thanks to Gayle for supporting us for all of the work we've done in the past and the work that we're going to be doing I'm going forward so in terms of the agenda for this hour I'll be spending about 10 minutes giving you an overview of the Agile BioFoundry and then we'll transition into three 15-minute talks um one on on on the biosensors but I'm from terica want to run machine learning from Hector Garcia Martin and then one on deep learning and from Phil at the end we will have kind of like an open mic Q&A for all we have enabled the chat functionality for everyone so everyone should be able to chat since we don't have that much time for Q&A I would probably suggest doing most of the Q&A just in in chat and that will give you the opportunity to get as much feedback on your on your questions as you can so I'm going to transition now into the overview of the agile bio Foundry and the the goal of the aAgile BioFoundry is to enable biorefineries to achieve 50 reductions in time to bio process scale it compared to the current average of around 10 years our objectives and outcomes are the development and deployment of technologies that enable commercially relevant biomanufacturing of a wide range of bioproducts by both new and established industrial hosts and we'll hear more about some of the hosts in a bit as a statement on relevance we are a 20 million U.S per year public infrastructure investment I'll give you some some more descriptions about what we mean by that in a few slides too and we'll be increasing U.S industrial competitiveness enabling opportunities for private sector growth and jobs the Agile BioFoundry directly supports the department of energy Energy Efficiency Renewable Energy's mission and objectives in three kind of high level ways so the first one would be decarbonizing energy intensive Industries not just by making the current Industries as they operate now more efficient but kind of going going back and and completely changing the way that things are done by starting out for for example from sustainable feedstocks as opposed to Fossil feedstocks second kind of category listed here would be around sustainable Aviation fuels and generally decarbonizing transportation and the third area would be diversity in science technology engineering and Mathematics and that was something that we're really excited to be about to launch is a one million dollar support for the minority supporting research and development Consortium in collaborations with us so we're really looking forward to that in terms of public infrastructure investment what I'm showing here on the screen is going to look pretty familiar to you in terms of just a generic subway map we've also you know adapted this from the the Bay Area Regional Transit the BART system around the San Francisco Bay area so in dark lines you might have subways I'm kind of in the circles you might have subway stops but the destinations where people really want to get to restaurants businesses schools those would be kind of the little dots that wouldn't be directly serviced by the subway for example but would be within a short walk or bike ride or taxi ride from where one of those stops would be so the the public infrastructure here would be the subway lines that get people fairly close to where they want to go but then it's really the private sector like taxis for example that would be getting people to the last mile now through analogy that's kind of how we're thinking about things too except we're talking more about chemical or biochemical type of a space um so here um the Agile BioFoundry would be these um these dark you know types of subway lines that enables companies to to not have to reinvent the wheel but get pretty close in chemical space to where they want to go and our subway stops here we're calling beachheads so these would be intermediate molecules that really wouldn't be the final destination but pretty close and provide quick access to a variety of downstream target molecules that that the private sector might want to be going after and some examples you know in terms of subway lines we would be having different host organisms maybe different host organisms have different metabolic capacities different bioprocess compatibilities ability to use different feedstocks and in terms of you know these specific examples of beachheads proto-catch u8 would be one example beachhead and muconic acid might be one example or what we call an exemplar of a representative type of a Target molecule now we don't just have this abstract map based on the BART system we've adapted a metabolic map that was published by Sang Yup Lee in the past few years to our purposes and shown here in green would be for example our beachhead molecules and blue some of our example our molecules and some of these beachheads are established and some of them are kind of prospecting as if you would be planning out a new subway stop sometime in the future in terms of the way that we operate we collaborate with industry or potentially you know other groups including including academics they would come come in with with some idea of where they where they want to go in chemical space we can kind of support those efforts in terms of techno-economic and life cycle assessments I'll Show a slide on that in a bit um post onboarding and development so you'll see a slide of the host that we're working with currently but if the industry partner has a new host that they want us to to onboard we have the capabilities to do that we have our core engineering cycle around design build test and learn and importantly in our test space we do have capabilities around process scale up and then just just a quick slide on techno-economic analyzes and my cycle assessments so these would really be you know helping ourselves or our collaborators understand the the economics of the process now as well as really importantly for for our mission and Department of Energy's mission looking at those greenhouse gas emissions and the decarbonization types of types of metrics um in terms of our our hosts um we have established within the Agile BioFoundry a tier system for kind of an increasing capabilities with each different host organism we have a manuscript about to submit to make this public so so more people can understand what we mean by these tiers and and use it for their own purposes if they they find useful currently we have 11 different hosts between bacteria and fungal organisms at at tier level one which basically means that we can operate with them in Agile BioFoundry context and we have five hosts between bacteria and fungal organisms elevated to to tier two which is basically unlocking additional more sophisticated types of capabilities in terms of our design build tests learn infrastructure and we do go around the full design build test learn cycle you'll be hearing in the next talk um from Taraka around some biosensors and then Hector Garcia Martin will be telling you some more about um predictable biomanufacturing and machine learning and Phil Laible will be telling you about deep learning if you want to hear more about our capabilities within the Agile BioFoundry I would refer you to our website in particular capabilities page so in addition to design build test and learn we have capabilities listed there around scale up and host onboarding and development so please check out that page if you're interested in learning how to work together with us so what are the the mechanisms that we could partner with your company or if you're you're an academic we have information there too for you so please check that out and to just emphasize um we are a distributed biofoundry across the U.S some seven different National Labs are participating and again we are supported um by the EERE office the Bioenergy Technologies office um and with that I will stop sharing my screen and I think we're just at time um so Taraka if you'd like to share um yours your slides and take it away can you hear me yes um yeah hi everyone um thanks for joining us today so today I will be talking about increasing the throughput of our design build test learn cycle um through biosensors and um the approaches that we use um associated with those um how to advance okay um so uh the design build test learn cycle actually has the potential to be very high throughput there are a number of advances that have been made all throughout this cycle that um allow us to both move through the cycle relatively quickly in some cases and also parallelize the cycle using combinatorial and Library based approaches so in the design space there are ways using combinatorial and computational design to make very large libraries or very large sets of variations in strains or vectors for example and um within the build space there are again opportunities um to make you know large libraries up to a million different variants or more uh in organisms at least with high transformation efficiency um and in learn um if you're if you're looking at things one by one learn can be very slow but as I think you'll hear um from Hector next or later and there that there are other and Phil there are various ways obviously to you know computationally increase our learn throughput and in test it can be really varied so if you're just looking at single flasks or doing your biochemical or Analytical assays in sort of a traditional way um it actually can be pretty slow um you know down to if you're thinking about you know looking at 96 well or even 384 well plates that it can be um you know looking at down to the thousands and so we are really interested in enabling all of the higher throughput of this cycle but specifically trying to improve the throughput of test and so focusing on library-based screening where we can look at you know at least libraries of you know 10 000 to up to a million variants and one single tube and there are a number of ABF capabilities that enable this higher throughput um these come from a variety of the National Labs associated with the ABF in the design space we have expertise in computational protein and Library design in the build space there are automated build tools that are being worked on as well as the development approach of approaches for building larger libraries especially in non-model microbes where the transformation efficiencies may not be as high as your traditional favorite laboratory microbes such as E coli in the test based space which is where I'll talk a lot about today we have a lot of expertise in developing fluorescence-based assays and developing fluorescent proteins for these types of assays custom biosensor development which Nathan mentioned briefly and then using things like flow cytometry and micro microfluidics for that rapid screening and then also even other novel approaches for screening libraries not just for survival but directly looking at the productivity of your strain and then on the Learned side which again you'll hear more about today um you know capabilities and tools for developing identifying enrich sequences in populations and all in addition to regular clonal sequencing um that then ideally can be fed into uh predictive models so the general scheme of the type of approach I'll be talking about today is shown on this slide and there are a lot of steps but it's a cartoon so hopefully it'll be pretty easy to follow along the idea is that we have a a library or a population of a lot of different variants again in one test tube and these variations can be just a single protein it can be the genome scale they can be on plasmids or they can be integrated within the genome but the idea is that you have a lot of different genetic variations in one spot in one test tube and you want to try to figure out how to parse those into what is interesting and what may be less interesting and the one way we do that is we tie phenotype to fluorescence so most often it's some type of fluorescent reporter and whether that's just straight up gfp or the gfp is associated with a biosensor for example it depends a little bit on how we set up the experiment but the idea is then you have um your library if you were to look at it has um differing fluorescence intensities for the different phenotypes and you can see that then if you put this type of library on something like a flow cytometer and so if you have a you know a negative control that's going to have very low fluorescence and maybe you have a positive control that has very high fluorescence and then your library ideally is going to have this kind of broad smear of different phenotypes and in this this graphic um we actually intend to fix the idea is that your share would go even past your best positive um your best positive so that you know ideally you're pulling out even more improved variants than what you've started with but nonetheless the idea is then you have lots of different phenotypes and then using something like fluorescence activated cell sorting you can pull out variations on subpopulations that have these different phenotypes sequence those subpopulations as pools and then look for enrichments in the different sequences to tie that phenotype to changes in the genetic the content of yourselves and then feed that into machine learning and so that's the kind of General approach that we are taking for a variety of different efforts and so as an example we've applied this for uh developing a synthetic biology tool that we call CIS repressors these are RNA-based Regulators that help regulate messenger RNA translation they're sort of an extension of Right Where the ribosomal binding site and what they do is they form a hairpin that to varying levels occludes the ribosomal binding site and then either allows a ribosome to get in or disallows their ribosome to get into varying levels such that you get different levels of protein expression and so in this particular example these hairpins are Upstream of super folder gfp and on a vector and what we did in this particular work was we took a library where we had you know different variations within the hairpin that we expected to allow it to open or close um to a different equilibria in the cell and we were able to show that in fact that Library did have this very broad distribution of fluorescence intensities and then we use fluorescence activated cell sorting to pull out eight different subpopulations when we regrow those in fact they do have different fluorescence intensities and when we sequence those subpopulations we were able to identify sequences that were well represented within those sort of subpopulations and not so represented in the others and when we pulled those out and made those individual variants we were able to show that we can tune the fluorescence intensity of gfp um you know really well using these different sequences and we were able to show that not only does this work for gfp but it also works from chlorine phenicol resistance we showed that this works on plasma as well as in the genome we've demonstrated it across a couple of different I think five different organisms now two of which are in this paper and um and we also tied it to mucinate production in this particular example and showed that you know using these ribber Regulators we can tune um the cell growth as well as mucinate productivity and so the outcome of this was that we were able to use this approach of creating a library tying it to fluorescence and then using cell sorting and pulling out different variants to identify a whole Suite of this type of tool without having to know a priori which sequence was going to give us the best result for these intermediate phenotypes and this allows us now to have this toolbox that allows us to tune translation um across a different a number of different systems and similarly we use the same type of approach when we are um developing biosensors so um we are focused on using transcription factor-based biosensors transcription factors occur in Native cells and what they do is they bind small molecules in the cell and um either repress or turn on uh gene activity and so what we do is we set up sort of we harness that uh piggyback off of that and set up a an artificial system where our transcription factors turn on gfp in the presence of a small molecule and so if our small molecule is a bioproduct of Interest such as such as mucinate then we can identify cells that have either varied or very increased concentrations of mucinate because those cells will make more gfp and therefore they will glow more green and so in order to do this we have to make the biosensor first and we use the same type of the same use the the same type of approach where now we're making a set of variations in an RNA we're making actual variations in the transcription factor and the DNA to which it finds and those libraries are then screened for fluorescence output and then we can pull out the brightest culture subculture and then we can identify the sequence of that biosensor and we've been able to change the specificity of biosensors we can tune the sensitivity of the biosensor to different Dynamic ranges depending on where a person is at in their strain development for a given product so if you're making very little um a product maybe you want a very sensitive biosensor but if you're pretty far along in your metabolic engineering then you may need a less sensitive biosensor so that it's not saturated by the time you get to the studying of it we've been able to tune these to different hosts and so um we've developed essentially a general approach for moving these sensors across organisms and for detecting new molecules and there's a couple of papers um this is just actually a subset of papers where we've described some of this work and then similarly we can actually apply these biosensors to look at metabolic pathways and production of bioproducts of interest and so um we have a couple of different examples where we have instead of messing with the transcription Factor now we have the biosensor of interest and now maybe we want to um manipulate the enzyme for off on target or off Target enzymes that might affect the production of our small molecules such as mucinate and so then again we're making um using computational protein engineering to make Gene libraries and um and then pairing that with the biosensor system in order to screen which cultures or which cells make the most um product of interest and we've been able to show that again using this system we're able to make higher uh cells with higher productivity and reach those Max productivities in less time and we've also demonstrated that we can use this to improve enzyme productivities in a certain pathway by reducing product inhibition we've identified new off often on target genes tied to improved titers and also use this to look at start looking at transporter sequences for different uptake molecules that might be important for bioprocesses so what else could we do with this so where are we going next um yeah this is just the same sort of the same graphic as a reminder that we're doing these Library approaches and then we're trying to get to the sequencing and machine learning piece and the examples that I just gave you did not have a heading machine learning piece it was all very um either pick single colonies or in the case of the those ribo Regulators we really looked at those sequences by eye um and so what we want to do next is one use look at different types of libraries and so we're really interested right now in whole genome Library screening we have some experience in the past with doing adaptive laboratory Evolution to change our um the diversity of our populations and do screening that was published in the Bentley paper and we also have some additional adaptive laboratory Evolution experiments that we've done that we were kind of wrapping up currently but in the meantime we're developing the same type of process and trying to demonstrate this entire process for other types of knock down and knockout libraries so we have three different types of libraries we're working on sort of in parallel across the ABF one is a trans repressor base which is roughly based on the CIS repressor RNA tool that I just described we have CRISPR-I uh knock down libraries that in the ABF and then we're working on this Rbt Seq knockout libraries as well and of course all these types of libraries have been used and are being used on pretty regularly but mostly associated with Fitness or tying um your product of Interest directly to Fitness and so what the scent coupling this to sensors does is allow you to directly detect your molecule of Interest instead of having to tie it to Fitness and the other complications that may come with that type of more indirect screen and then for integrating with learn this is something that we're really excited about um uh that we were focused on trying to get uh really off the ground in the in the near future so instead of just sorting for the brightest screen the best producers now we're really interested in pulling multiple populations the way we did with that RNA the CIS repressor RNA experiment so we want to screen for productivity variations in productivity essentially and pull out subpopulations of sequence variants that are tied to something like say mucineate productivity and then do the sequencing and then feed that information into our learn into the learn team and the reason for this is that for machine learning models it's important to know not just what works but also what does not work in order to better bound those models and so library-based screening is a really great way to support this and again allows us to significantly parallelize the DBTL effort um that that we have in The Foundry and that's the end um thanks to there's obviously a lot of people involved in this so thanks to that that I found your team and to everyone who was listening today great thank you um Taraka um Hector if you want to start sharing your slides and everyone if you have questions for Taraka feel free to put him into the zoom chat um and Taraka can be watching for those while Hector presents go ahead Hector okay uh can you see my slides now yes okay great thank you for the opportunity I'm here to talk about machine learning and predictive Manufacturing in the Agile BioFoundry uh the first thing I'd like to say is that the really machine learning has applications before synthetic biology Spectrum all the way from the side and which product to satisfy our societal need how to do the the United Pathways how to engineer the biological system and optimize it all the way to high performance system to help scale it up and how to do downstream processing to get the final byproduct and we discussed uh some of these or many of these applications in our review from metabolic engineering and you can you can have a look at that you are interested in in this topic or you want to learn you want an introduction of machine learning from metabolic engineering um I'm going to concentrate right now on this part over here in optimization as the system or doing DNA design and um the the reason that that we focus on machine learning is because this transition from from learn to design is is the main button like for the DBTL cycle um the design built test learn cycle we have for design we have a variety of tools like DIVA, Raven or or the Teselagen design module we're starting for making strains we're starting to have a lot of crispr tools to modify strains and and the ability to synthesize DNA is really going down in an expansion way in test we're starting to see uh things like the the well High throughput flows for a standard mass spectometry but also the kind of fluorescence biosensors that Taraka was talking about in the previous talk but in learn uh often what we have is it's just a lot of data and we have to think this involves sometimes things like kinetic models or you know Scale Models or a type but you know they often don't work as expected and how to reconcile the model with the data involves thinking and that's perhaps good great if you're an academic environment where you have like five, six months and a product to think but often in the industrial environment you don't have that time and the good thing or the good news is the machine learning really uh provides algorithms that systematically improve with more data so it doesn't involve uh someone having a happy idea then the algorithms that we have they are designed to improve uh systematically more than as you create more and more data and it tells you where to look for to get that data to make them better and I'll give you an example of that so there's an advantage of machine learning that it really is able to couple this test to design in a really systematic way that doesn't really involve months they can be done in days now ABF to really support these abilities like machine learning has created a whole set of computational infrastructure for predictive manufacturing including uh like the inventory of compulsion elements to keep the information on on streams also things like the experiment data Depot where you can put all your omics data and your all your experimental data and be able to visualize it and then download it into a machine learning algorithms that will really recommend the next experiment and and I'll talk about ART in this case in the next um the next talk you will see someone deep learning techniques that Phil Laible has been applied in terms of art art has been decided um to suit synthetic biology needs so art can work with very few instances and I'll show you an example with your results with as little or as few as only 27 instances ART provides a certain modifications that tells you whether the prediction that it does is good or not or whether or let's say whether it trust that prediction a lot or not so instead of telling you this is my prediction it gives you the whole probability distribution of the prediction from which you can quantify that started if it's very localized is a very certain prediction if it's very broad it's a very certain prediction and um also ART provides not just predictions uh which is where a lot of machine learning methods do but recommendations for the next step so it really doesn't just take that information to predict what is the outcome but uses that predictive power to recommend the next design cycle and let me give you an example of that that's been published in a couple of papers a couple of years ago in which we're working for people from the center for bio sustainability in Denmark to to engineer years to take glucose and convert it into a tryptophan and in this case they had a particular interesting technique in which they were able to use CRISPR to create a libraries of promoters they were able to choose let's say five genes here and and six types of promoters and make whole libraries in a high throughput Fashion and and that's that kind of high capability is something that we can leverage to machine learning now in this case we used a mixture of machine learning and mechanistic models this is the reaction set all the way from glucose to the glycolysis of the glycosis and pintos project pathway um all the way into tryptophan and we do we use genome Scale Models to figure out or to predict where are the five reactions that will be most likely to impact the productivity of tryptophan and then we had um a library of six possible promoters for each of these five genes and as you can see we did all of them all the six promoters for all of the five genes we will have around 8 000 combinations which is quite a lot of combinations to do and experimentally well challenging what we did is we did only a few of them around five percent of them and from that we were able to uh use that data to extrapolate to find whether the promoters that will give you the highest productivity we did this to with the automated recommendation tool that is trained with data in this case we the training data is going to be an input and a response the coronal response the input is the promoter combination for Gene one parameter one from Gene 2 promoter 7 for Gene three parameter 15 and then the correspondent amount of uh tryptophan productivity which is what we're trying to optimize here we have a variety of instances of that that is used um in our to uh to to print the model and in this case you know uh we we had 264 combinations out of the 8000 or 3.4 percent of all these possible combinations
and even though this was still only 300 combinations it wasn't 8 000. uh we still needed a high throughput way of doing the measurement of productivity which was also a biosensor uh that uh um that created a certain amount of reversal protein depending on the amount of protein and that is what really enable the high throughput data set to do this with this data we created a predictive model there's probabilistic as I said so it tells you this small recommendation have 10 chance of producing 10 Millions per liter on a 20 chance of producing 5 milligrams per liter and so on and then it uses this predictive model to do recommendation for the next cycle so um if this combination promoters produce this response try uh the third promoter in gene one the ninth promoter in the gene 2 and the 12th note and the gene3 to create uh to produce this amount of tryptophan with this problem distribution now uh in this case uh well we started trying to do a 696 well plays which was supposed to give us around 600 strains and uh and duplicate around 1800 samples but of course uh this um growing the strains and and constructing them they are having some problems because this is biology and there's always things that don't necessarily work sometimes we didn't have the the genotype information of which genes were which promoters were put in each team sometimes there was a fail assembly of the CRISPR method and and we wouldn't able to put the promoters that we're expecting and they were able to read the genotype information sometimes there was a growth threshold that did not pass so we thought we couldn't we couldn't do any sequencing sometimes they were we weren't able to get rid of the complementation plasmate or sometimes we didn't get a single population so all of the separation of data ended up in producing uh just a third of the initial expected data around 782 instances or samples in total which is 464 combinations of promoters which is three percent of all the possible ones um in order to to uh to use machine learning to use a Target you need to figure out what is what you're trying to optimize in this case we use uh productivity which is one of the things that is least intuitive to uh to to increase people know um metabolic Engineers know how intuition how to boost yields and dieters but productivity is nothing much much harder and that's why we tried that in order to uh to do that we measure the change in gfp there's the tryptophan and divided by them and the time that it took to do that and that's our productivity we try to choose a time that wasn't too early so we didn't get black effects but it wasn't too late so we didn't get into aeration effects that could be not maybe translatable to other fermentation um other ways of doing fermentation um this is the leader the actual productivity the change in gfp or tryptophan per unit time um this is for the first Gene second Gene third Gene fourth Gene and fifth gen and each of them have a different um uh who have a choice of the six different promoters you can tell there is no clear Trend it's not clear well you should use the personal model of the or the sixth promoter in gene one because they all respond all the productivity sample The Only Exception maybe is this Gene 3 this promoters here seem to produce the the higher productivity but other than that there is no way that you can choose uh just by looking at it you know uh this promoter that promoted that promoter that promoting that promoter so that's why we are using machine learning to do that because machine learning is particularly good and ingesting all this um all this data and and figure out the non-linear relationships within the different promoters now in this case we were lucky to work with people who really care a lot about the data quality and you can see that in the replicates uh so this is on the replicates so we had three replicates for each of them it's really good one two and three is the histogram of all the productivity values and this is the uh replica three versus one replica two versus one and so on if there was perfect reputability all the results would be on the diagonal most of them are but there are some particularly this one there are are really really problematic and that's where they were eliminated so we filtered out around 15 replicates with high errors because I will throw the machine learning approach off and then art ingested all this data and suggested new promoters and and suggested 15 of those and said well try the third promoter in gene one the 8th modern and Gene 2 and and so on and you should expect this amount of productivity with this product distribution this is the the mean for each of the predictions and this is here or here is the highest amount of productivity that obtained so far so this tale of a product distribution tells you what is the probability of exceeding uh the the productivity that you had so far so how did it work out um well quite well we were able to increase productivity 100 of our wild diet over the well over the base strain and 70 of the best in the library these are the obser productivity to the fluorescence proxy to the gfp and this is the predictive productivity this is the initial base strain that we started with this is all the library that we did with the 272 different promoters that used uh provided by a crispr method and these are their recommendations and as you can tell the best recommendations improve uh productivity 70 of the best in library and around 105 of the base frame and this was just one DBTL cycle now as you can tell this is just an fluorescence proxy how does you know how does this um expand when you actually use the HPLC to measure the amount of tryptophan and we did that and and the base strain interestingly um had well none of the best none of the the strengths in the library actually surpass the base frame but the recommendation by Art actually increased production over the Western library and base stream by 43 percent and this was the result of one dvdl cycle um as I said before we can get all these new recommendations and the data of the measurements put them back in in ART and in a matter of uh 20 minutes you can have new recommendations and keep increasing uh you know in a systematic way without really having to break your head so this is what is called Active Learning so to recapitulate, Art is ideally suited for synthetic biology needs you can work with few instances and in this case um I didn't talk about the the case with only 27 integers I do it in a minute but we did show that how we provide a standard modification and how we provide recommendations not just predictions instead of working with a few instances you can see an example in the ART paper there already what you read at all where we were able to take 27 instances from a limonene producing strain and we were able to make uh make predictions for for the second dbtl cycle that were quite accurate and and indeed you will see that the recommendations we did with are very one are very close to the ones that were experimentally tested in that in that project so we again really leverage um a slow amount of oops sorry low amount of instances as low as 27 so far and really create meaningful recommendations that can uh guide the metabolic Engineering in an effective way um well if you're interested in art there is a whole webpage art.ldl.gov that has a tutorials and and a paper that explains how to use art with synthetic data and how to use it with EDD and ice and you're welcome to use it there is also um part front-end at agilebiofoundry.org if you're not interested into coding you can just upload your data and then in the the server will send you your recommendations that you can you can use as a place that's it I mean um synthetic biology can do a lot of things from biofuels to biomaterials and buy products to variety of different things but uh the problem is that we can't really predict what's going to happen which is like biology so it's really hard to design machine learning can help that uh but it needs tons of data that can really truly early obtain effectively through Automation and the high throughput methods that that taraka had been shown in the in the in the previous talk um and that's it I really want to thank for all this work to the people in my group but also all the different groups in the National Labs uh both in Berkeley and and other ones that have really made that this work possible and make the the National Labs the interdisciplinary um uh environment that they should be up in this kind of work maybe even more details on these capabilities you can always go to Agile biofoundry.org Dash capabilities and see our design build test online type of properties I only had 15 minutes today to show this but there are many other things that uh the agile bio Foundry can can offer for uh for your company thank you very much all right thanks Hector um and if people have questions for Hector please put those in chat um for now um and Phil if you want to share your screen and get started thanks Nathan also thanks to Emily and Stacy for help organize this webinar and for this opportunity um today I'm expanding on what Taraka and Hector have told you and I'm going into another capability of the ABF where we're using deep learning for improvements in strain and process designs and biomanufacturing so just to go back to the engineering cycle the Agile BioFoundry has adopted I want to just highlight right now that this is an effort that is highly integrated between the learn team and the test team to use Advanced data sets that are coming from our test capabilities and you'll see that we're not only expanding these capabilities but we're integrating stronger with design and with build as we go forward into future iterations and improvements using learn within the Agile BioFoundry um everyone has done a pretty good job of letting you know how to find more information about our capabilities and I'll just reiterate uh learn can be found here on our web pages and both Hector and I are talking about AI approaches and um you can find out more by going within um these sub pages but also today I wanted to let you know that the Deep learning is actually going to take advantage of a lot of other learning capabilities within the Agile BioFoundry to employ mechanistic models metabolic models there regulatory models and in order to understand uh the predictive um outputs of these models we have complex data visualization that can help in those regards so we have adopted over the last few years within um our aspect of the learn team and ecosystem for Australian Improvement and this is depicted on the side where we're going from the left side where we're organizing builds and strain engineering efforts all the way through test and using models to understand how information flows as processes scale from A miniaturized variation and the bottom part of the middle part of the graph up to scaled analytics that are found in Pilot level facilities then through um production phases and clear um dissemination or determination of the phenotype of engineered strains that we are evaluating within our design um and test I'm sorry host Target teams and so we'll go more into this and how deep learning is applied what we're actually doing is building relationships between these different data types at different scales and different levels of evaluation of the organism and then building artificial neural networks between these layers and stacking these layers such as the output of one of the layers becomes the input and other and so you have relationships between them and you can very clearly then go between the start of this process to the outputs and in the end be predictive of what new modifications would do if they're evaluated in the same way so again um the Deep learning exercise has mainly leveraged large omics data sets that are available from our test team at Pacific Northwest National Lab so here we're talking about um transcriptomics proteomics metabolomics both internal metabolites and external metabolites these are all available and used as layers and deep modeling approaches in combination with information from our build and strained engineering efforts all the way to then metadata on what the phenotype is doing with Advanced analytics on the product um production end so here then say it's in a different way show it diagrammatically we're leveraging relevant biological Frameworks to understand what the engineering is doing on an organism level to understand what's being up and down regulated what proteins are being produced in larger levels or down regulated within the cell and how all these components from the cell interact and work together to change the output of this bio process and influence the final titers productivity rates and ultimate yields of these processes and as iterative so we'll go through work with the teams get a data set learn from it predict and then use these predictions in new rounds of design build test and learn and use the new data then to reinforce our training data sets to improve our predictions and um then reevaluate the performance to see if we're getting better or um need to build in a new type of data set to help with what we're looking at these approaches are being adopted from other fields of use where it's very helpful to use information Theory as a modeling scaffold where we're taking inputs coming from a fermentation process and using transmitters channels and receivers and to decode the information and use our models that are expanding and adapting from a metabolic understanding of organisms understanding how regulatory networks within the organism turn off and on sets of genes and influence what they're making how they're growing what they're utilizing as feedstocks and they know in the end for our needs how that's affecting them in a biomatic manufacturing context and so this approach can be used um in a complementary way to what Hector was describing as optimizing a production pathway but has really shown to excel in identification of off pathway um engineering targets that would help turn off or modify Pathways that are no longer needed about organism maybe to survive in nature as they're now being repurposed in a biomanufacturing context and this is also important to consider an integrated bioprocess environment and this is where these deep learning approaches are really starting to excel um so here's uh the generic approach that we use for deep learning um we are setting up what you'll hear referred to in this presentation as a learn friendly um experimental approach where it's usually Guided by giving experimental observation or the inability to have a good experimental observation where you know what you want to do and work with experimentalists and to Define and experimental goal do you want to increase the titers the rates the productivity or the overall yield of the process and then design experiments around these goals ensuring that you have a matrix of experimental parameters that distinguishes between different mechanisms and maybe leading to improvements or um uh negative effects that you're observing and a range of phenotypes like Taraka was talking about so we know not only where we think we want to go with first principles understanding of that approach but also where it's not encouraged to go and those are very important distinctions um from that then we can identify a modeling approach design modeling parameters and most importantly then interact with the test team to make sure the requirements that we're getting and training data sets are going to be able to feed our modeling approaches so some lessons learned and we have learned in some of the first iterations and um years of going through this process is that establish and Rapport and a common language is very important to build a really great working relationship which engaged experimentalists and that goes a long way also productivity and the outcomes the predictive outcomes are greatly enhanced through the establishment of learn friendly experimental designs like we've talked about here on the left hand side of the slide so here's um one example of many that we've used these approaches and these learned friendly experimental um guide guidelines to work on improved forms of Bio catalysts within agile bile Foundry so here's an example where we're using pseudonymous putida to produce muconic acid Taraka talked about this a little bit with her biosensors and the observation that we had in this one case is that a single Gene deletion led to differential regulation of more than 18 percent of the transcriptome and more than 38 in metabolome and so we established a goal then to understand what are the regulatory mechanisms which small changes in the gene profile of this organism lead to very large changes in the metabolic structure of the organism and through several rounds of these deep learning type approaches with predictive outcomes and going through this learn ecosystem in different ways um we have led to the understanding or the identification of more than 30 different improvements that could lead to strain modifications and increased titer rates yields for this organism here we have over 30 regulatory genes some of which are important to the production pathway others that are influencing growth of this organism or the utilization of feedstocks by this organism we also are studying transport homes through artificial neural network um models and we have looked at three Transformers that increase the export of the product from the organism to decreases toxicity and there are also 32 Gene targets many unannotated that were proposed to increase growth or product um titers and these predictions are in various states of validation and reinforcement but it's a real key example of how you can actually layer these modeling efforts with AI approaches and end up with deep learning with predictive capability that can improve your biomanufacturing processes um you can see that our modeling approaches have evolved um through the startups on these projects to be relatively simple and we can make them as complex as we need to to understand and reach the goals that the modeling team and the experiment was have agreed upon so here's an example where you're going from a simple Information Network that mainly is focused on regulatory networks and relates them to both levels of metabolites outside the cell and inside the cell to one on the right where we're looking at interactions between different types of information within the cell not only external but internal how the protein levels are varied and how that influences then the production of the product that this host has been charged with making we have active academic industrial projects where these approaches are being utilized and there's just a sampling of some of the um projects or the goals of the projects that um we've worked on in this capacity one is identification and maintenance and prolonged periods of Maximum product synthesis if you have an organism that goes into a stage where it's producing something in a steady state you want to keep it there and can we learn how to do that and overcome departures from that steady stage um yield enhancements through elimination of unnecessary off pathway metabolic processes this is going to really highlight your yields ultimately and there are many examples of where these enhancements are needed utilization of complex feedstocks that have low zero or negative value or is a is an area that is growing also by bio cam callous optimization or you might have a step or two in your pathway or maybe an off Target pathway you want to eliminate um or in the pathway process you want to optimize you can do this by studying a range of Bio callus and how you can optimize that and lastly as an example assembly stabilization a function of a large multi-cellent unit macro molecular assemblies you can look at how these assemblies and subunits of them are synthesized independently and come together to form a complex that actually results in high levels of the product that you're interested in in manufacturing so with that um toward an end and future looking um we want to use these deep learning approaches to end up with a learn guided intelligent biomanufacturing scheme where we can actually expand on the type of models that are included within our deep learning exercises to on the front end know more about the feedstocks and whether or not you can get away with vsocs that are of um reduced value or have inherent variability and toward the other end of the spectrum on the advanced output side of these value chains you want to know if you're producing this can you get it out of the system and then Downstream processing scenario that is effective and do we need to include anything that would enable Downstream processing early on in and our understanding of the process or the global process in general and these are things that we're considering incorporating in newer iterations of this deep learning approach and lastly um there's the potential to include economic and environmental modeling efforts in the Deep learning approaches to know how impacts in earlier stages of these steps in the biomanufacturing may impact them and you could Factor those in early on if it is a value um lastly we talked about visual learn ecosystem and how it's been functioning for us in the Agile BioFoundry in some example processes and expansion into collaborations with academic partners and Industrial partners however there's a lot of potential for ecosystem expansion leveraging things that Taraka have reviews and also Hector has talked about where we'll be able to learn with larger more complex and complete data sets we're here we can use library of strains um going into our evaluation phases maybe libraries of engineered bio Catalyst utilization of information from pan genome analyzes so we know what an organism is needed for its normal function and survival and particular environment but many organisms have large expansions of their genome that allow them to really Excel with different product classes and function in varied ecosystems and so can we utilize them and take advantage of them in the future in our deep learning approaches um as Hector and Tara cavaloo team energization of cultures and bioreactors is going to be really um instrumental in moving our processes forward one example is through microphotics we're in the upper right we're showing that you actually could evaluate cells in small little test tube environments that might be as low as 70 picoliters and upper right hand corner in those cases of their little droplets those little microfuse tubes are holding millions of bioreactors within them and these can be leveraged and feed AI evaluations of these strains then we really really have something and we're getting very close to being able to use them in the Agile BioFoundry so um certainly um important to talk about higher throughput means of strain evaluation as Taraka alluded to biosensors have um helped the agile Foundry in many ways not only with what character talked about Hector talked about them and we're utilizing them in our deep learning approaches photos cytometry is going to be very helpful as well and especially if you can couple that with high throughput sequence based Analytics that are very available today and in addition to that high quality rapid genome resequencing efforts that would able to capitalize and utilize adaptive laboratory evolutionary insights in deep learning approaches so we can really understand when you put selective pressure on organism and evolve into something that's creating something more efficiently and for lower costs how then we can get this information very rapidly into our AI approaches so we can capitalize that going forward
2022-10-01