Cutting Edge Technologies for Accelerating Bioproduct Development

Show video

so welcome everybody um to our Agile BioFoundry  webinar today I'm Nathan Hillson I'm the   the lead principal investigator of the Agile  BioFoundry I'd first just like to thank Stacey   um Stacey Young within um DOE for helping get  this Zoom meeting and the webinar set up I'd also   like to thank on the Agile BioFoundry side Emily  Nelson for helping us get the agenda coordinated   make sure we all know what we're doing um I'm  going to be giving the the agenda in the next   slide so you know what's going to be happening  over the next hour and then I will transition   into providing an overview of the Agile BioFoundry  I'd also like to thank DOE the Energy Efficiency   renewable energy office at bioenergy Technologies  office and on with us we're really lucky to have   Gayle Bentley who is our technology manager so  so thanks to Gayle for supporting us for all   of the work we've done in the past and the work  that we're going to be doing I'm going forward   so in terms of the agenda for this hour I'll be  spending about 10 minutes giving you an overview   of the Agile BioFoundry and then we'll transition  into three 15-minute talks um one on on on the   biosensors but I'm from terica want to run machine  learning from Hector Garcia Martin and then one   on deep learning and from Phil at the end we will  have kind of like an open mic Q&A for all we have   enabled the chat functionality for everyone so  everyone should be able to chat since we don't   have that much time for Q&A I would probably  suggest doing most of the Q&A just in in chat   and that will give you the opportunity to get as  much feedback on your on your questions as you can   so I'm going to transition now into the overview  of the agile bio Foundry and the the goal of the   aAgile BioFoundry is to enable biorefineries to  achieve 50 reductions in time to bio process scale   it compared to the current average of around  10 years our objectives and outcomes are the   development and deployment of technologies that  enable commercially relevant biomanufacturing   of a wide range of bioproducts by both new and  established industrial hosts and we'll hear more   about some of the hosts in a bit as a statement on  relevance we are a 20 million U.S per year public   infrastructure investment I'll give you some  some more descriptions about what we mean by that   in a few slides too and we'll be increasing U.S  industrial competitiveness enabling opportunities   for private sector growth and jobs the Agile  BioFoundry directly supports the department   of energy Energy Efficiency Renewable Energy's  mission and objectives in three kind of high   level ways so the first one would be decarbonizing  energy intensive Industries not just by making   the current Industries as they operate now more  efficient but kind of going going back and and   completely changing the way that things are done  by starting out for for example from sustainable   feedstocks as opposed to Fossil feedstocks  second kind of category listed here would be   around sustainable Aviation fuels and generally  decarbonizing transportation and the third area   would be diversity in science technology  engineering and Mathematics and that was   something that we're really excited to be about  to launch is a one million dollar support for the   minority supporting research and development  Consortium in collaborations with us so we're   really looking forward to that in terms of public  infrastructure investment what I'm showing here on   the screen is going to look pretty familiar to  you in terms of just a generic subway map we've   also you know adapted this from the the Bay Area  Regional Transit the BART system around the San   Francisco Bay area so in dark lines you might have  subways I'm kind of in the circles you might have   subway stops but the destinations where people  really want to get to restaurants businesses   schools those would be kind of the little dots  that wouldn't be directly serviced by the subway   for example but would be within a short walk or  bike ride or taxi ride from where one of those   stops would be so the the public infrastructure  here would be the subway lines that get people   fairly close to where they want to go but then  it's really the private sector like taxis for   example that would be getting people to the last  mile now through analogy that's kind of how we're   thinking about things too except we're talking  more about chemical or biochemical type of a space   um so here um the Agile BioFoundry would be these  um these dark you know types of subway lines that   enables companies to to not have to reinvent the  wheel but get pretty close in chemical space to   where they want to go and our subway stops  here we're calling beachheads so these would   be intermediate molecules that really wouldn't  be the final destination but pretty close and   provide quick access to a variety of downstream  target molecules that that the private sector   might want to be going after and some examples you  know in terms of subway lines we would be having   different host organisms maybe different host  organisms have different metabolic capacities   different bioprocess compatibilities ability to  use different feedstocks and in terms of you know   these specific examples of beachheads proto-catch  u8 would be one example beachhead and muconic acid   might be one example or what we call an exemplar  of a representative type of a Target molecule now   we don't just have this abstract map based on the  BART system we've adapted a metabolic map that was   published by Sang Yup Lee in the past few years to  our purposes and shown here in green would be for   example our beachhead molecules and blue some  of our example our molecules and some of these   beachheads are established and some of them are  kind of prospecting as if you would be planning   out a new subway stop sometime in the future in  terms of the way that we operate we collaborate   with industry or potentially you know other groups  including including academics they would come   come in with with some idea of where they where  they want to go in chemical space we can kind of   support those efforts in terms of techno-economic  and life cycle assessments I'll Show a slide on   that in a bit um post onboarding and development  so you'll see a slide of the host that we're   working with currently but if the industry partner  has a new host that they want us to to onboard we   have the capabilities to do that we have our core  engineering cycle around design build test and   learn and importantly in our test space we do  have capabilities around process scale up and   then just just a quick slide on techno-economic  analyzes and my cycle assessments so these would   really be you know helping ourselves or our  collaborators understand the the economics of   the process now as well as really importantly  for for our mission and Department of Energy's   mission looking at those greenhouse gas emissions  and the decarbonization types of types of metrics   um in terms of our our hosts um we have  established within the Agile BioFoundry a tier   system for kind of an increasing capabilities with  each different host organism we have a manuscript   about to submit to make this public so so more  people can understand what we mean by these tiers   and and use it for their own purposes if they  they find useful currently we have 11 different   hosts between bacteria and fungal organisms  at at tier level one which basically means   that we can operate with them in Agile BioFoundry  context and we have five hosts between bacteria   and fungal organisms elevated to to tier two  which is basically unlocking additional more   sophisticated types of capabilities in terms of  our design build tests learn infrastructure and   we do go around the full design build test learn  cycle you'll be hearing in the next talk um from   Taraka around some biosensors and then Hector  Garcia Martin will be telling you some more about   um predictable biomanufacturing and machine  learning and Phil Laible will be telling   you about deep learning if you want to hear  more about our capabilities within the Agile   BioFoundry I would refer you to our website in  particular capabilities page so in addition to   design build test and learn we have capabilities  listed there around scale up and host onboarding   and development so please check out that page if  you're interested in learning how to work together   with us so what are the the mechanisms that we  could partner with your company or if you're   you're an academic we have information there  too for you so please check that out and to   just emphasize um we are a distributed biofoundry  across the U.S some seven different National Labs   are participating and again we are supported um by  the EERE office the Bioenergy Technologies office   um and with that I will stop sharing my  screen and I think we're just at time   um so Taraka if you'd like to share  um yours your slides and take it away can you hear me yes um yeah hi everyone um thanks for  joining us today so today I will be   talking about increasing the throughput  of our design build test learn cycle   um through biosensors and um the approaches  that we use um associated with those um how to advance okay um so uh the design build test learn cycle  actually has the potential to be very high   throughput there are a number of advances that  have been made all throughout this cycle that   um allow us to both move through the cycle  relatively quickly in some cases and also   parallelize the cycle using combinatorial and  Library based approaches so in the design space   there are ways using combinatorial and  computational design to make very large   libraries or very large sets of variations in  strains or vectors for example and um within   the build space there are again opportunities um  to make you know large libraries up to a million   different variants or more uh in organisms  at least with high transformation efficiency   um and in learn um if you're if you're looking  at things one by one learn can be very slow but   as I think you'll hear um from Hector next  or later and there that there are other and   Phil there are various ways obviously to you know  computationally increase our learn throughput and   in test it can be really varied so if you're just  looking at single flasks or doing your biochemical   or Analytical assays in sort of a traditional way  um it actually can be pretty slow um you know down   to if you're thinking about you know looking at  96 well or even 384 well plates that it can be   um you know looking at down to the  thousands and so we are really interested in   enabling all of the higher throughput  of this cycle but specifically trying to   improve the throughput of test and so focusing  on library-based screening where we can look at   you know at least libraries of you know 10 000  to up to a million variants and one single tube   and there are a number of ABF capabilities  that enable this higher throughput   um these come from a variety of the National  Labs associated with the ABF in the design   space we have expertise in computational protein  and Library design in the build space there are   automated build tools that are being worked on as  well as the development approach of approaches for   building larger libraries especially in non-model  microbes where the transformation efficiencies   may not be as high as your traditional  favorite laboratory microbes such as E coli   in the test based space which is where I'll talk  a lot about today we have a lot of expertise   in developing fluorescence-based assays and  developing fluorescent proteins for these types of   assays custom biosensor development which Nathan  mentioned briefly and then using things like   flow cytometry and micro microfluidics for that  rapid screening and then also even other novel   approaches for screening libraries not just for  survival but directly looking at the productivity   of your strain and then on the Learned side which  again you'll hear more about today um you know   capabilities and tools for developing identifying  enrich sequences in populations and all in   addition to regular clonal sequencing um that  then ideally can be fed into uh predictive models   so the general scheme of the type of approach I'll  be talking about today is shown on this slide and   there are a lot of steps but it's a cartoon so  hopefully it'll be pretty easy to follow along the   idea is that we have a a library or a population  of a lot of different variants again in one test   tube and these variations can be just a single  protein it can be the genome scale they can be   on plasmids or they can be integrated within  the genome but the idea is that you have a lot   of different genetic variations in one spot in one  test tube and you want to try to figure out how to   parse those into what is interesting and what  may be less interesting and the one way we do   that is we tie phenotype to fluorescence so most  often it's some type of fluorescent reporter and   whether that's just straight up gfp or the gfp is  associated with a biosensor for example it depends   a little bit on how we set up the experiment but  the idea is then you have um your library if you   were to look at it has um differing fluorescence  intensities for the different phenotypes and you   can see that then if you put this type of  library on something like a flow cytometer   and so if you have a you know a negative control  that's going to have very low fluorescence and   maybe you have a positive control that has very  high fluorescence and then your library ideally   is going to have this kind of broad smear of  different phenotypes and in this this graphic   um we actually intend to fix the idea is that  your share would go even past your best positive   um your best positive so that you know ideally  you're pulling out even more improved variants   than what you've started with but nonetheless the  idea is then you have lots of different phenotypes   and then using something like fluorescence  activated cell sorting you can pull out variations   on subpopulations that have these different  phenotypes sequence those subpopulations as pools   and then look for enrichments in the different  sequences to tie that phenotype to changes in   the genetic the content of yourselves and then  feed that into machine learning and so that's the   kind of General approach that we are taking for a  variety of different efforts and so as an example   we've applied this for uh developing a synthetic  biology tool that we call CIS repressors these   are RNA-based Regulators that help regulate  messenger RNA translation they're sort of an   extension of Right Where the ribosomal binding  site and what they do is they form a hairpin   that to varying levels occludes the ribosomal  binding site and then either allows a ribosome   to get in or disallows their ribosome to get into  varying levels such that you get different levels   of protein expression and so in this particular  example these hairpins are Upstream of super   folder gfp and on a vector and what we did in this  particular work was we took a library where we had   you know different variations within the hairpin  that we expected to allow it to open or close   um to a different equilibria in the cell and we  were able to show that in fact that Library did   have this very broad distribution of fluorescence  intensities and then we use fluorescence activated   cell sorting to pull out eight different  subpopulations when we regrow those in fact   they do have different fluorescence intensities  and when we sequence those subpopulations we   were able to identify sequences that were well  represented within those sort of subpopulations   and not so represented in the others and when  we pulled those out and made those individual   variants we were able to show that we can tune  the fluorescence intensity of gfp um you know   really well using these different sequences and  we were able to show that not only does this work   for gfp but it also works from chlorine phenicol  resistance we showed that this works on plasma as   well as in the genome we've demonstrated it across  a couple of different I think five different   organisms now two of which are in this paper and  um and we also tied it to mucinate production in   this particular example and showed that you know  using these ribber Regulators we can tune um the   cell growth as well as mucinate productivity and  so the outcome of this was that we were able to   use this approach of creating a library tying  it to fluorescence and then using cell sorting   and pulling out different variants to identify a  whole Suite of this type of tool without having to   know a priori which sequence was going to give us  the best result for these intermediate phenotypes   and this allows us now to have this  toolbox that allows us to tune translation   um across a different a number of different  systems and similarly we use the same type of   approach when we are um developing biosensors  so um we are focused on using transcription   factor-based biosensors transcription factors  occur in Native cells and what they do is they   bind small molecules in the cell and um either  repress or turn on uh gene activity and so what   we do is we set up sort of we harness that uh  piggyback off of that and set up a an artificial   system where our transcription factors turn on  gfp in the presence of a small molecule and so   if our small molecule is a bioproduct of Interest  such as such as mucinate then we can identify   cells that have either varied or very increased  concentrations of mucinate because those cells   will make more gfp and therefore they will glow  more green and so in order to do this we have   to make the biosensor first and we use the same  type of the same use the the same type of approach   where now we're making a set of variations  in an RNA we're making actual variations in   the transcription factor and the DNA to which  it finds and those libraries are then screened   for fluorescence output and then we can pull out  the brightest culture subculture and then we can   identify the sequence of that biosensor and we've  been able to change the specificity of biosensors   we can tune the sensitivity of the biosensor  to different Dynamic ranges depending on where   a person is at in their strain development for  a given product so if you're making very little   um a product maybe you want a very sensitive  biosensor but if you're pretty far along in your   metabolic engineering then you may need a less  sensitive biosensor so that it's not saturated   by the time you get to the studying of it we've  been able to tune these to different hosts and so   um we've developed essentially a  general approach for moving these   sensors across organisms and for detecting  new molecules and there's a couple of papers   um this is just actually a subset of papers  where we've described some of this work   and then similarly we can actually apply these  biosensors to look at metabolic pathways and   production of bioproducts of interest and so um  we have a couple of different examples where we   have instead of messing with the transcription  Factor now we have the biosensor of interest and   now maybe we want to um manipulate the enzyme  for off on target or off Target enzymes that   might affect the production of our small molecules  such as mucinate and so then again we're making   um using computational protein engineering to  make Gene libraries and um and then pairing that   with the biosensor system in order to screen  which cultures or which cells make the most   um product of interest and we've been able to  show that again using this system we're able to   make higher uh cells with higher productivity and  reach those Max productivities in less time and   we've also demonstrated that we can use this to  improve enzyme productivities in a certain pathway   by reducing product inhibition we've identified  new off often on target genes tied to improved   titers and also use this to look at start looking  at transporter sequences for different uptake   molecules that might be important for bioprocesses  so what else could we do with this so where are we   going next um yeah this is just the same sort of  the same graphic as a reminder that we're doing   these Library approaches and then we're trying to  get to the sequencing and machine learning piece   and the examples that I just gave you did not have  a heading machine learning piece it was all very   um either pick single colonies or in the case  of the those ribo Regulators we really looked   at those sequences by eye um and so what we  want to do next is one use look at different   types of libraries and so we're really interested  right now in whole genome Library screening we   have some experience in the past with doing  adaptive laboratory Evolution to change our   um the diversity of our populations and do  screening that was published in the Bentley   paper and we also have some additional adaptive  laboratory Evolution experiments that we've done   that we were kind of wrapping up currently but  in the meantime we're developing the same type   of process and trying to demonstrate this entire  process for other types of knock down and knockout   libraries so we have three different types of  libraries we're working on sort of in parallel   across the ABF one is a trans repressor base which  is roughly based on the CIS repressor RNA tool   that I just described we have CRISPR-I uh knock  down libraries that in the ABF and then we're   working on this Rbt Seq knockout libraries as well  and of course all these types of libraries have   been used and are being used on pretty regularly  but mostly associated with Fitness or tying   um your product of Interest directly to Fitness  and so what the scent coupling this to sensors   does is allow you to directly detect your molecule  of Interest instead of having to tie it to Fitness   and the other complications that may come with  that type of more indirect screen and then for   integrating with learn this is something that  we're really excited about um uh that we were   focused on trying to get uh really off the  ground in the in the near future so instead   of just sorting for the brightest screen the best  producers now we're really interested in pulling   multiple populations the way we did with that RNA  the CIS repressor RNA experiment so we want to   screen for productivity variations in productivity  essentially and pull out subpopulations of   sequence variants that are tied to something  like say mucineate productivity and then do   the sequencing and then feed that information into  our learn into the learn team and the reason for   this is that for machine learning models it's  important to know not just what works but also   what does not work in order to better bound those  models and so library-based screening is a really   great way to support this and again allows us  to significantly parallelize the DBTL effort   um that that we have in The Foundry and that's the  end um thanks to there's obviously a lot of people   involved in this so thanks to that that I found  your team and to everyone who was listening today   great thank you um Taraka um Hector if you want  to start sharing your slides and everyone if you   have questions for Taraka feel free to put him  into the zoom chat um and Taraka can be watching   for those while Hector presents go ahead Hector  okay uh can you see my slides now yes okay great thank you for the opportunity I'm here to  talk about machine learning and predictive   Manufacturing in the Agile BioFoundry uh  the first thing I'd like to say is that the   really machine learning has applications before  synthetic biology Spectrum all the way from the   side and which product to satisfy our societal  need how to do the the United Pathways how to   engineer the biological system and optimize  it all the way to high performance system   to help scale it up and how to do downstream  processing to get the final byproduct and we   discussed uh some of these or many of these  applications in our review from metabolic   engineering and you can you can have a look  at that you are interested in in this topic   or you want to learn you want an introduction  of machine learning from metabolic engineering   um I'm going to concentrate right now on this  part over here in optimization as the system   or doing DNA design and um the the reason that  that we focus on machine learning is because   this transition from from learn to design is  is the main button like for the DBTL cycle   um the design built test learn cycle we have  for design we have a variety of tools like DIVA,   Raven or or the Teselagen design module we're  starting for making strains we're starting to have   a lot of crispr tools to modify strains and and  the ability to synthesize DNA is really going down   in an expansion way in test we're starting to see  uh things like the the well High throughput flows   for a standard mass spectometry but also the kind  of fluorescence biosensors that Taraka was talking   about in the previous talk but in learn uh often  what we have is it's just a lot of data and we   have to think this involves sometimes things like  kinetic models or you know Scale Models or a type   but you know they often don't work as expected and  how to reconcile the model with the data involves   thinking and that's perhaps good great if you're  an academic environment where you have like five,   six months and a product to think but often in  the industrial environment you don't have that   time and the good thing or the good news is the  machine learning really uh provides algorithms   that systematically improve with more data so it  doesn't involve uh someone having a happy idea   then the algorithms that we have they are designed  to improve uh systematically more than as you   create more and more data and it tells you where  to look for to get that data to make them better   and I'll give you an example of that so there's  an advantage of machine learning that it really   is able to couple this test to design in a really  systematic way that doesn't really involve months   they can be done in days now ABF to really support  these abilities like machine learning has created   a whole set of computational infrastructure for  predictive manufacturing including uh like the   inventory of compulsion elements to keep the  information on on streams also things like the   experiment data Depot where you can put all your  omics data and your all your experimental data and   be able to visualize it and then download it into  a machine learning algorithms that will really   recommend the next experiment and and I'll talk  about ART in this case in the next um the next   talk you will see someone deep learning techniques  that Phil Laible has been applied in terms of art   art has been decided um to suit synthetic biology  needs so art can work with very few instances   and I'll show you an example with your results  with as little or as few as only 27 instances   ART provides a certain modifications that tells  you whether the prediction that it does is good   or not or whether or let's say whether it trust  that prediction a lot or not so instead of telling   you this is my prediction it gives you the whole  probability distribution of the prediction from   which you can quantify that started if it's very  localized is a very certain prediction if it's   very broad it's a very certain prediction and um  also ART provides not just predictions uh which is   where a lot of machine learning methods do but  recommendations for the next step so it really   doesn't just take that information to predict what  is the outcome but uses that predictive power to   recommend the next design cycle and let me give  you an example of that that's been published in   a couple of papers a couple of years ago in which  we're working for people from the center for bio   sustainability in Denmark to to engineer years  to take glucose and convert it into a tryptophan   and in this case they had a particular interesting  technique in which they were able to use CRISPR to   create a libraries of promoters they were able to  choose let's say five genes here and and six types   of promoters and make whole libraries in a high  throughput Fashion and and that's that kind of   high capability is something that we can leverage  to machine learning now in this case we used a   mixture of machine learning and mechanistic models  this is the reaction set all the way from glucose   to the glycolysis of the glycosis and pintos  project pathway um all the way into tryptophan   and we do we use genome Scale Models to figure out  or to predict where are the five reactions that   will be most likely to impact the productivity  of tryptophan and then we had um a library of six   possible promoters for each of these five genes  and as you can see we did all of them all the   six promoters for all of the five genes we will  have around 8 000 combinations which is quite a   lot of combinations to do and experimentally well  challenging what we did is we did only a few of   them around five percent of them and from that  we were able to uh use that data to extrapolate   to find whether the promoters that will give you  the highest productivity we did this to with the   automated recommendation tool that is trained with  data in this case we the training data is going   to be an input and a response the coronal response  the input is the promoter combination for Gene one   parameter one from Gene 2 promoter 7 for Gene  three parameter 15 and then the correspondent   amount of uh tryptophan productivity which is what  we're trying to optimize here we have a variety of   instances of that that is used um in our to uh  to to print the model and in this case you know   uh we we had 264 combinations out of the 8000 or  3.4 percent of all these possible combinations  

and even though this was still only 300  combinations it wasn't 8 000. uh we still needed   a high throughput way of doing the measurement of  productivity which was also a biosensor uh that uh   um that created a certain amount of reversal  protein depending on the amount of protein and   that is what really enable the high throughput  data set to do this with this data we created a   predictive model there's probabilistic as I said  so it tells you this small recommendation have 10   chance of producing 10 Millions per liter on a 20  chance of producing 5 milligrams per liter and so   on and then it uses this predictive model to do  recommendation for the next cycle so um if this   combination promoters produce this response try uh  the third promoter in gene one the ninth promoter   in the gene 2 and the 12th note and the gene3 to  create uh to produce this amount of tryptophan   with this problem distribution now uh in this case  uh well we started trying to do a 696 well plays   which was supposed to give us around 600 strains  and uh and duplicate around 1800 samples but of   course uh this um growing the strains and and  constructing them they are having some problems   because this is biology and there's always  things that don't necessarily work sometimes we   didn't have the the genotype information of which  genes were which promoters were put in each team   sometimes there was a fail assembly of the  CRISPR method and and we wouldn't able to put   the promoters that we're expecting and they were  able to read the genotype information sometimes   there was a growth threshold that did not pass  so we thought we couldn't we couldn't do any   sequencing sometimes they were we weren't able  to get rid of the complementation plasmate or   sometimes we didn't get a single population so all  of the separation of data ended up in producing uh   just a third of the initial expected data around  782 instances or samples in total which is 464   combinations of promoters which is three percent  of all the possible ones um in order to to uh to   use machine learning to use a Target you need to  figure out what is what you're trying to optimize   in this case we use uh productivity which is one  of the things that is least intuitive to uh to to   increase people know um metabolic Engineers  know how intuition how to boost yields and   dieters but productivity is nothing much much  harder and that's why we tried that in order   to uh to do that we measure the change in gfp  there's the tryptophan and divided by them and   the time that it took to do that and that's our  productivity we try to choose a time that wasn't   too early so we didn't get black effects but it  wasn't too late so we didn't get into aeration   effects that could be not maybe translatable  to other fermentation um other ways of doing   fermentation um this is the leader the actual  productivity the change in gfp or tryptophan per unit time um this is for the first Gene  second Gene third Gene fourth Gene and fifth gen   and each of them have a different um uh who have a  choice of the six different promoters you can tell   there is no clear Trend it's not clear well you  should use the personal model of the or the sixth   promoter in gene one because they all respond  all the productivity sample The Only Exception   maybe is this Gene 3 this promoters here seem to  produce the the higher productivity but other than   that there is no way that you can choose uh just  by looking at it you know uh this promoter that   promoted that promoter that promoting that  promoter so that's why we are using machine   learning to do that because machine learning  is particularly good and ingesting all this   um all this data and and figure out the non-linear  relationships within the different promoters   now in this case we were lucky to work with people  who really care a lot about the data quality and   you can see that in the replicates uh so this is  on the replicates so we had three replicates for   each of them it's really good one two and three  is the histogram of all the productivity values   and this is the uh replica three versus one  replica two versus one and so on if there was   perfect reputability all the results would be on  the diagonal most of them are but there are some   particularly this one there are are really really  problematic and that's where they were eliminated   so we filtered out around 15 replicates with high  errors because I will throw the machine learning   approach off and then art ingested all this data  and suggested new promoters and and suggested 15   of those and said well try the third promoter in  gene one the 8th modern and Gene 2 and and so on   and you should expect this amount of productivity  with this product distribution this is the the   mean for each of the predictions and this is here  or here is the highest amount of productivity   that obtained so far so this tale of a product  distribution tells you what is the probability   of exceeding uh the the productivity that you  had so far so how did it work out um well quite   well we were able to increase productivity 100 of  our wild diet over the well over the base strain   and 70 of the best in the library these are the  obser productivity to the fluorescence proxy to   the gfp and this is the predictive productivity  this is the initial base strain that we started   with this is all the library that we did with the  272 different promoters that used uh provided by a   crispr method and these are their recommendations  and as you can tell the best recommendations   improve uh productivity 70 of the best in library  and around 105 of the base frame and this was just   one DBTL cycle now as you can tell this is just an  fluorescence proxy how does you know how does this   um expand when you actually use the HPLC to  measure the amount of tryptophan and we did   that and and the base strain interestingly  um had well none of the best none of the the   strengths in the library actually surpass  the base frame but the recommendation by   Art actually increased production over the  Western library and base stream by 43 percent   and this was the result of one dvdl cycle  um as I said before we can get all these new   recommendations and the data of the measurements  put them back in in ART and in a matter of uh 20   minutes you can have new recommendations and keep  increasing uh you know in a systematic way without   really having to break your head so this is what  is called Active Learning so to recapitulate,   Art is ideally suited for synthetic biology needs  you can work with few instances and in this case   um I didn't talk about the the case with only 27  integers I do it in a minute but we did show that   how we provide a standard modification and how  we provide recommendations not just predictions   instead of working with a few instances you can  see an example in the ART paper there already   what you read at all where we were able to take  27 instances from a limonene producing strain and   we were able to make uh make predictions for for  the second dbtl cycle that were quite accurate and   and indeed you will see that the recommendations  we did with are very one are very close to the   ones that were experimentally tested in that  in that project so we again really leverage   um a slow amount of oops sorry low amount of  instances as low as 27 so far and really create   meaningful recommendations that can uh guide  the metabolic Engineering in an effective way um well if you're interested in art there is a  whole webpage art.ldl.gov that has a tutorials   and and a paper that explains how to use art  with synthetic data and how to use it with EDD   and ice and you're welcome to use it there is  also um part front-end at agilebiofoundry.org   if you're not interested into coding you can  just upload your data and then in the the server   will send you your recommendations  that you can you can use as a place   that's it I mean um synthetic biology can do a  lot of things from biofuels to biomaterials and   buy products to variety of different things but uh  the problem is that we can't really predict what's   going to happen which is like biology so it's  really hard to design machine learning can help   that uh but it needs tons of data that can really  truly early obtain effectively through Automation   and the high throughput methods that that taraka  had been shown in the in the in the previous talk   um and that's it I really want to thank for all  this work to the people in my group but also all   the different groups in the National Labs uh  both in Berkeley and and other ones that have   really made that this work possible and make  the the National Labs the interdisciplinary   um uh environment that they should be up in  this kind of work maybe even more details on   these capabilities you can always go to Agile  biofoundry.org Dash capabilities and see our   design build test online type of properties I  only had 15 minutes today to show this but there   are many other things that uh the agile bio  Foundry can can offer for uh for your company   thank you very much all right thanks Hector um  and if people have questions for Hector please   put those in chat um for now um and Phil if  you want to share your screen and get started thanks Nathan also thanks to Emily and Stacy  for help organize this webinar and for this   opportunity um today I'm expanding on what  Taraka and Hector have told you and I'm going   into another capability of the ABF where we're  using deep learning for improvements in strain   and process designs and biomanufacturing so  just to go back to the engineering cycle the   Agile BioFoundry has adopted I want to  just highlight right now that this is   an effort that is highly integrated between the  learn team and the test team to use Advanced data   sets that are coming from our test capabilities  and you'll see that we're not only expanding these   capabilities but we're integrating stronger  with design and with build as we go forward   into future iterations and improvements  using learn within the Agile BioFoundry   um everyone has done a pretty good job of letting  you know how to find more information about our   capabilities and I'll just reiterate uh learn  can be found here on our web pages and both   Hector and I are talking about AI approaches and  um you can find out more by going within um these   sub pages but also today I wanted to let you know  that the Deep learning is actually going to take   advantage of a lot of other learning capabilities  within the Agile BioFoundry to employ mechanistic   models metabolic models there regulatory models  and in order to understand uh the predictive   um outputs of these models we have complex data  visualization that can help in those regards   so we have adopted over the last few years within  um our aspect of the learn team and ecosystem for   Australian Improvement and this is depicted on the  side where we're going from the left side where   we're organizing builds and strain engineering  efforts all the way through test and using models   to understand how information flows as processes  scale from A miniaturized variation and the bottom   part of the middle part of the graph up to scaled  analytics that are found in Pilot level facilities   then through um production phases and  clear um dissemination or determination   of the phenotype of engineered strains  that we are evaluating within our design   um and test I'm sorry host Target teams and so  we'll go more into this and how deep learning   is applied what we're actually doing is building  relationships between these different data types   at different scales and different levels of  evaluation of the organism and then building   artificial neural networks between these layers  and stacking these layers such as the output   of one of the layers becomes the input and  other and so you have relationships between   them and you can very clearly then go between  the start of this process to the outputs and in   the end be predictive of what new modifications  would do if they're evaluated in the same way   so again um the Deep learning exercise has mainly  leveraged large omics data sets that are available   from our test team at Pacific Northwest  National Lab so here we're talking about   um transcriptomics proteomics metabolomics both  internal metabolites and external metabolites   these are all available and used as layers and  deep modeling approaches in combination with   information from our build and strained  engineering efforts all the way to then   metadata on what the phenotype is doing  with Advanced analytics on the product   um production end so here then say it's in a  different way show it diagrammatically we're   leveraging relevant biological Frameworks to  understand what the engineering is doing on an   organism level to understand what's being up and  down regulated what proteins are being produced   in larger levels or down regulated within the  cell and how all these components from the cell   interact and work together to change the output  of this bio process and influence the final titers   productivity rates and ultimate yields of these  processes and as iterative so we'll go through   work with the teams get a data set learn from  it predict and then use these predictions in new   rounds of design build test and learn and use the  new data then to reinforce our training data sets   to improve our predictions and um then reevaluate  the performance to see if we're getting better or   um need to build in a new type of data  set to help with what we're looking at   these approaches are being adopted from other  fields of use where it's very helpful to use   information Theory as a modeling scaffold where  we're taking inputs coming from a fermentation   process and using transmitters channels and  receivers and to decode the information and   use our models that are expanding and adapting  from a metabolic understanding of organisms   understanding how regulatory networks within  the organism turn off and on sets of genes   and influence what they're making how they're  growing what they're utilizing as feedstocks   and they know in the end for our needs how that's  affecting them in a biomatic manufacturing context   and so this approach can be used um in a  complementary way to what Hector was describing   as optimizing a production pathway but has really  shown to excel in identification of off pathway   um engineering targets that would help turn  off or modify Pathways that are no longer   needed about organism maybe to survive in  nature as they're now being repurposed in   a biomanufacturing context and this is also  important to consider an integrated bioprocess   environment and this is where these deep  learning approaches are really starting to excel   um so here's uh the generic approach  that we use for deep learning   um we are setting up what you'll hear referred  to in this presentation as a learn friendly   um experimental approach where it's usually  Guided by giving experimental observation or the   inability to have a good experimental observation  where you know what you want to do and work with   experimentalists and to Define and experimental  goal do you want to increase the titers the   rates the productivity or the overall yield of  the process and then design experiments around   these goals ensuring that you have a matrix  of experimental parameters that distinguishes   between different mechanisms and maybe leading  to improvements or um uh negative effects that   you're observing and a range of phenotypes  like Taraka was talking about so we know not   only where we think we want to go with first  principles understanding of that approach but   also where it's not encouraged to go and those are  very important distinctions um from that then we   can identify a modeling approach design modeling  parameters and most importantly then interact with   the test team to make sure the requirements that  we're getting and training data sets are going to   be able to feed our modeling approaches so some  lessons learned and we have learned in some of   the first iterations and um years of going  through this process is that establish and   Rapport and a common language is very important  to build a really great working relationship   which engaged experimentalists and that goes a  long way also productivity and the outcomes the   predictive outcomes are greatly enhanced through  the establishment of learn friendly experimental   designs like we've talked about here on the left  hand side of the slide so here's um one example   of many that we've used these approaches  and these learned friendly experimental   um guide guidelines to work on improved forms of  Bio catalysts within agile bile Foundry so here's   an example where we're using pseudonymous putida  to produce muconic acid Taraka talked about this a   little bit with her biosensors and the observation  that we had in this one case is that a single Gene   deletion led to differential regulation of more  than 18 percent of the transcriptome and more   than 38 in metabolome and so we established a  goal then to understand what are the regulatory   mechanisms which small changes in the gene  profile of this organism lead to very large   changes in the metabolic structure of the organism  and through several rounds of these deep learning   type approaches with predictive outcomes and going  through this learn ecosystem in different ways   um we have led to the understanding or the  identification of more than 30 different   improvements that could lead to strain  modifications and increased titer rates   yields for this organism here we have over 30  regulatory genes some of which are important to   the production pathway others that are influencing  growth of this organism or the utilization of   feedstocks by this organism we also are studying  transport homes through artificial neural network   um models and we have looked at  three Transformers that increase   the export of the product from the organism  to decreases toxicity and there are also 32   Gene targets many unannotated that were  proposed to increase growth or product   um titers and these predictions are in various  states of validation and reinforcement but it's   a real key example of how you can actually layer  these modeling efforts with AI approaches and end   up with deep learning with predictive capability  that can improve your biomanufacturing processes   um you can see that our modeling approaches have  evolved um through the startups on these projects   to be relatively simple and we can make them as  complex as we need to to understand and reach the   goals that the modeling team and the experiment  was have agreed upon so here's an example where   you're going from a simple Information Network  that mainly is focused on regulatory networks   and relates them to both levels of metabolites  outside the cell and inside the cell to one on   the right where we're looking at interactions  between different types of information within   the cell not only external but internal how  the protein levels are varied and how that   influences then the production of the product  that this host has been charged with making   we have active academic industrial projects where  these approaches are being utilized and there's   just a sampling of some of the um projects or  the goals of the projects that um we've worked   on in this capacity one is identification and  maintenance and prolonged periods of Maximum   product synthesis if you have an organism that  goes into a stage where it's producing something   in a steady state you want to keep it there  and can we learn how to do that and overcome   departures from that steady stage um yield  enhancements through elimination of unnecessary   off pathway metabolic processes this is going to  really highlight your yields ultimately and there   are many examples of where these enhancements are  needed utilization of complex feedstocks that have   low zero or negative value or is a is an area that  is growing also by bio cam callous optimization or   you might have a step or two in your pathway or  maybe an off Target pathway you want to eliminate   um or in the pathway process you want to optimize  you can do this by studying a range of Bio callus   and how you can optimize that and lastly as  an example assembly stabilization a function   of a large multi-cellent unit macro molecular  assemblies you can look at how these assemblies   and subunits of them are synthesized independently  and come together to form a complex that actually   results in high levels of the product  that you're interested in in manufacturing   so with that um toward an end and future looking  um we want to use these deep learning approaches   to end up with a learn guided intelligent  biomanufacturing scheme where we can actually   expand on the type of models that are included  within our deep learning exercises to on the front   end know more about the feedstocks and whether  or not you can get away with vsocs that are of   um reduced value or have inherent variability  and toward the other end of the spectrum on the   advanced output side of these value chains you  want to know if you're producing this can you   get it out of the system and then Downstream  processing scenario that is effective and do   we need to include anything that would  enable Downstream processing early on in   and our understanding of the process or the global  process in general and these are things that we're   considering incorporating in newer iterations  of this deep learning approach and lastly   um there's the potential to include economic  and environmental modeling efforts in the Deep   learning approaches to know how impacts  in earlier stages of these steps in the   biomanufacturing may impact them and you could  Factor those in early on if it is a value um lastly we talked about visual learn ecosystem  and how it's been functioning for us in the Agile   BioFoundry in some example processes and expansion  into collaborations with academic partners and   Industrial partners however there's a lot of  potential for ecosystem expansion leveraging   things that Taraka have reviews and also Hector  has talked about where we'll be able to learn   with larger more complex and complete data  sets we're here we can use library of strains   um going into our evaluation phases maybe  libraries of engineered bio Catalyst utilization   of information from pan genome analyzes so we know  what an organism is needed for its normal function   and survival and particular environment but many  organisms have large expansions of their genome   that allow them to really Excel with different  product classes and function in varied ecosystems   and so can we utilize them and take advantage of  them in the future in our deep learning approaches   um as Hector and Tara cavaloo team energization  of cultures and bioreactors is going to be   really um instrumental in moving our processes  forward one example is through microphotics we're   in the upper right we're showing that you actually  could evaluate cells in small little test tube   environments that might be as low as 70 picoliters  and upper right hand corner in those cases of   their little droplets those little microfuse  tubes are holding millions of bioreactors   within them and these can be leveraged and feed AI  evaluations of these strains then we really really   have something and we're getting very close to  being able to use them in the Agile BioFoundry   so um certainly um important to talk about  higher throughput means of strain evaluation   as Taraka alluded to biosensors have um  helped the agile Foundry in many ways not   only with what character talked about Hector  talked about them and we're utilizing them   in our deep learning approaches photos  cytometry is going to be very helpful   as well and especially if you can couple that  with high throughput sequence based Analytics   that are very available today and in addition  to that high quality rapid genome resequencing   efforts that would able to capitalize and utilize  adaptive laboratory evolutionary insights in deep   learning approaches so we can really understand  when you put selective pressure on organism and   evolve into something that's creating something  more efficiently and for lower costs how then we   can get this information very rapidly into our AI  approaches so we can capitalize that going forward

2022-10-01

Show video