hello Community do you know that there's a hidden truth in the AI in the year 2024 you can understand it easily just with three building blocks and today if you're new to AI or you want to see a brand new introduction to Ai and explain AI to you in simple terms this is the video for you so let's focus here on the first block and let's start now you know if you have a large language model you'll start here with a data set you have here an empty llm and the task is now to find a data set for the pre-training of the llm and let's say for example we take here the current data set here in 2024 the proof pile two so we have 55 billion token data set of mathematical and scientific documents or we take here for example the common crawl this is here from 237 billion HTML pages from the internet you filter it you have a language filtered and you look butle of the mathematical filter then you do avoid duplication and you end up finally with about 6.3 million documents on the complete internet about mathematics I know it's shocking amount just 6 million but hey at least we have close to 15 billion token and yes you find it on hugging phase and of course it is Party of the proof pile two so you see we go from 237 billion to just simply 55 billion token but of course when people started to rebuild open source data set like red pajama they did it here with with those data sets common crawl C4 G books all of archive all of Wikipedia so and then you end up with a little bit more about 1.2 trillion token for your data set so we take here the together Computer red pajama data and here the 1.2 trillion token data set you also find it on hugging face great and then you have maybe some historical data set the C4 data set from 202 20 and you see the idea is that you put layer on layer on your particular training data that you want your llm to operate if you want to have a medical llm well you have all the medical documents you want to have a finance llm yes you guessed it so we have now here our particular training data set for the pre-training of an llm and this is a unique mixture of data but this defines the performance now of our L because now we start to pre-train here our empty llm and you know when I say pre-train to be specific I mean here the tenser weights see in this complex new network structure of the Transformer based with self attention large language model learns now to replicate specific semantic patterns from the training data set thereby modifying its numeric tens of values and this is the way an llm learns knowledge it finds pattern semantic patterns and this is what we do when we say we pre-train an empty llm now if we take just a tiny llm like a mistal 7B which has the nice function that it is really a pet two license so it's open source you can use it you see that our parameter if we build this empty model the architecture that we use here those here are the characteristics so there are complicated way ways to build it but there's also kind of a standardized approach how to build this network from the architecture we have a vocabulary size of 32k we have a context length of about 8,000 token the number of hats here is 32 we go here with a specific number of layers and in total we have about 7 billion free trainable parameters of course in 2024 we have some major advanc here for example fostter inference we can now use Advanced attention mechanism like the group very attention or if you want a hand longer sequence here in our context length of our prompt here at smaller cost we use here sliding window attention so there's a lot of new mechanisms a lot of new developments that in 2024 we apply immediately whenever we build up new models and then if we have the architecture and we have the data now we start the pre-training and if you think about proper ter models like gb4 turbo or Bard or Palm 2 and JY Ultra currently not available in Europe or claw 2.1 it takes quite some time because those are huge model so let's say today with all the new advancement here in the attention mechanism and the training mechanism let's say so in a first month we have about 1,000 gpus and some cluster on AWS for example okay this runs one month this runs two months and about let's say after four months on just 1,000 gpus we have our pre-trained llm this is quite expensive and in 2023 I would have told you if you arrive here you spend about $1 million us but you know what in 202 24 and maybe in the mid of 2024 maybe we can achieve this with around $100,000 depending of course on the size of your architecture of the llm and the specific data set you use but just that gives you an idea beautiful but now this pre-trained llm is rather unambitious because we have a specific task for the AI so we want this AI this llm here to use in a particular way and we want for example here a particular task like a question and answer a chatbot or we want here a summarization of some technical scientific literature or you want that it paints whatever you want so we build it here for particular purpose and to achieve this that we have a task specific specialization you know what we do we start now to fine tune our llm and for this fine tuning we need a huge data set and 2024 of course we take an instruction based data set to find tune our llm it is in principle the same that we did here with the pre-training llm and now we do the fine-tuning but now this data set is limited for a particular task like say summarization of scientific documents and then you know 2024 we only do know a dpu alignment so if you want to have that your models behaves in a certain way so some specific behavior we don't learn a lot of semantic new knowledge to this but we want that the model is friendly that it greets you that it behaves in a certain way it is not aggressive it does not use some specific words it is a nice and Charming AI assistant we have here two deep P alignment and of course we need here an alignment data set and I showed you here some real world data set you find here on hugging face that I would recommend that you use here beginning in January 2024 so for fine tuning there's this beautiful Ultra chat data set and if you want to have the DPO alignment go with the ultra feedback data set you see now we have it now we have our large language Mal pre-trained fine-tuned and aligned here with a specific behavior now beauty is we can build this open source and mistol 7B for example is a model we get from the company mistal EI and since this is licens under MIT or aach 2 license so it is free for us to use and we take this as a building block and then we just fine-tune our llm with a specific standardized Python program and then we have our DPO alignment and you have a complete AI system for your task done if you have now and this is the beauty your specific data and your specific wish how the model should behave you use your data set of your company of your books or whatever you data you have and you do here the fine tuning and the DPO alignment with your data and for this your data might not be the format that those python files need so you will have here an optimization of the data format just to bring the data in shape that the python programs can use them and job done beautiful to give you an idea how simple it is I have to tell you in 2024 this is standardized we have a standardized fine tuning if if you want to see this here in a python code this is all that we need this is the core element here so we have a supervised fine tuning with a parameter efficient fine-tuning with a low rank adaptation model and it trains an adapter and shares them instead of training an entire model so we have here something that uses less energy that is faster and that is cheaper but gives us a good performance so we have here a configuration for path then we take from hugging phase any model that you like an auto model for qum language modeling we take here GPT Neo for example we have an 8bit quantization and here we do now toine tuning in those four five lines we have the model we have the data set we tell them hey this is what you should look at we have the path configuration and then you say trainer. train and this is it
you see standardized there's nothing to it and of course with the DPO alignment the Python program looks the same we have a standardized DPO trainer it's a script by hugging phase and it takes care about the complicated task of doing this job for us the beauty is if you use those standardized model by the open source Community if we have new methodologies new technologies like like here in this video I showed you that you can optimize your fine-tuning process and get a 25 performance jump with a new tuning technology let's call it here in this video and you know how it is integrated look it's just one line of code so in your supervised fine-tuning trainer from hugging phase you add now simply one line of code nav tune this one here and you have one parameter I explain this parameter in this video you add it for your task and you take advantage of this new algorithm this code is updated regularly by the open source community and this is what I really would recommend if you start new withi or you want to implement flash attention too or 4bit quantization look it's just a single line standardized fine tuning and the same goes here with the DPO alignment now if you think hey my data structure is not aligned this is what you want to achieve this is how your data should look like that you can use the standardized alignment we have a dictionary structure with three elements you have a prompt and the first example is hello the second is how are you and then since we want to have a behavior we have a good behavior and a bad behavior the good behavior is here under Chosen and the bad behavior is what we reject but the model has to learn what do we like and what do we not like our AI to behave like so second line how are you and what you want the system to respond is I'm fine what you do not want your system to respond is I'm not fine or leave me alone so you see clearly that you have here a particular prompt you have a chosen answer and you reject some specific behavior or patterns of behavior so everything should be in this format and I will show you how to do this with your data and that's it now you have the alignment and I showed you the DPO training for mugging phase if you want complete python files with complete examples this is it look just go there copy it it go there copy it and you can use it you can f tune and you can have a DPO alignment of your EI system you see how simple AI become in 2024 so and here are now our files you remember this is exactly what I showed you hugging face Transformer reinforcement learning the scripts and we go here on the right hand side for this supervised fine tuning python file as you can see it was just updated last week and this is the nice thing that I want to show you always use something that a community cares about it continuously updated so here we go what we have we have our typical accelerator our data set our paft our Transformer and our Transformer reinforcement learning with our supervised fine-tuning trainer module that works for GPU and a TPU attend a processing unit from Google yeah do not care about here the cloud arguments we'll take a look at this a little bit later I just want to show you here so at first we have the model you have a auto model from hugging face for a causal language model from pre-trained so here you have your model name then you decide if you want to go with a quantization 4bit 8bit if you want to say trust remote it depends on your model and then you define here your D type and if you need an authentification this is where it goes now for the model and step two is then to load the data set there are some default values for you so we go now into the classy of the script argument and you see here the model name the first option default it is a model by Facebook a very tiny model with 350 million free trainable parameter only you choose here any model that you like on hugging face that you find that you would like to use second the data set as you can see we go here with Tim dmer open Assistant guanako database you choose here any particular data set that you like let's have a look at this in detail this is hugging phase data set team dmer open assistant guanako and you say here you have thousands and thousands of rows but let's just look at what this data set is because your data if you have your company data this is the way you should provide them here to the supervised fine tuning so you have human I want to start doing astrophotography as a hobby and a suggestion what I could do and then since this is a dialogue the assistant comes back and with an instruction says Hey getting started in astrop photography can seem daunting but with some patient and practice you become a master you will need to learn AB C D and and then the human comes back and says hey can you tell me more what would you recommend as a basic set of equipment to get started with how much will it cost and when the assistant comes back and says hey can be fun and a rewarding hobby as a beginner you will need this and this kind of camera and there you have thousands and thousands and thousands of possible conversation in a data set for an open assistant so you see you have here exactly where you go if you are now in a financial company and you say you have here with your client you have here let's say question or an explanation how you communicate or within your company you have a human and then you have here for example a solution and then the conversation goes on and says hey can you give me more details and then you have here yes on this financial product you need this and this and this and this is so beautiful and this is the best product so you see you build from your database from your data entry from your data L from your Delta L you buil here very simply a data set that is similar to this one so you see exactly how the structure of the data set should be you have here on the right side a model strain here with this particular you can see here from Mosaic machine learning this was used are a lot of other training data sets available if you want to know a trick if you have a model go and have a look on what this model has been trained on what data sets it has been trained on if you find one data set that you like from its structure try to simulate the exact same data format this is always a good idea beautiful so now you see we have here a lot of default values maybe the first run with default and then you can start to optimize but we were here we have now the model defined and then we have the data set loaded from hugging phase and then it is easy then you define your training arguments and here are all the default values everything is done for you you have to do nothing just choose here the values that seem appropriate for you if you decide that you want to use here a path configuration a parameter efficient fine-tuning with for example a low rank adaptation methodology you use here P Laura you have here your parameter I have a specific video on Laura configuration and you just insert here the parameter and then comes the beautiful thing you define here the trainer for this you need of course a tokenizer from hugging phase depending on what tokenizer your model has been tokenized what the size of your vocabulary is what is the structure of your token and then this is it look trainer and then train train everything has been taken care of you have chosen the model the training arguments the maximum sequence length the specific fine-tuning data set do you want to use then if you want to go with path yes or no with lower yes or no and of course a appropriate tokenizer for your large language model and then you let it run for 1 2 3 4 hours 10 hours 2 days depending on the data set that you have and then you save the model and then you have a fine-tune large language model you see as easy as can be this is a standardized model this will be updated continuously here the hugging phase all new modes will become available so I think this is a great place to start beautiful and then let's do here the DPO here it's exactly the same as you see here this was 3 weeks ago that we had an update beautiful and you have you have more or less the same structure you have your data set your Laura configuration your Transformer and then from the Transformer reinforcement learning you have your DPO trainer already programmed for you available for you here as a specific class again we have some arguments never mind for the moment yeah and then the data set now for example here they have this example we say okay we take now some data set somewhere on the Internet or if you have your data set in a specific format from your database or from your literature or from your I don't know what wherever you have your data from here example loads The entropic Helpful harmless data set from hugging phas and convert it to the necessary format for our DPO training for our alignment now as I told you this DPO data set you need to convert to a dictionary with the clear structure you have to follow this structure so you have prompt there a list of a string then you have chosen this is a list of a string and then you have to reject it a negative example also as a list of strings and this is it so whatever your data set is and here we go with the example of helpful harmless data set you have to convert it in this format and here you have very simple Python program it does exactly form now this data format with prompt Chosen and Tred for you so if you say great so this is here the transformation python script let's have here a look how does the entropic uh harmfulness uh data set look in the original form for example here uh human can you provide me with the home address for actor Ryan rolds and the assistant comes back and says hey which actor do you mean if you mean this particular Ryan rolds then his address is this the human says hey do you have a phone number for him as well I want to know him that I'm coming by and the assistant says hey I'm sorry you know and this is the chosen answer as you can see he doesn't have a publicly available full number email address and maybe he doesn't enjoy it you just walk by in his home and say hello I'm here there is also as we talking here about um DPO training process here the behavior of the model that you do not want to have to rejected answer so you have more or less the same dialogue hey I'm looking here for the home address and this is the home address and then the assistant comes back and says hey if you want to contact here this particular person by phone or email I can help you with that so maybe here our AI or if whatever there is you do not want here to not respect your the privacy of this actor so you see you want want your system to behave nicely say hey sorry he's an actor he is publicly available but you also have some private sphere and you do not want that the system goes on and really find out here his email or phone number so that you can really disturb him beautiful so you see you have a behavior alignment how you want your model to be and if you have your data for example I don't know with your help desk and you have recorded here the help desk conversation so you have a lot of things that you say hey this was the perfect conversation here with our client on the help desk so I wanted this data set goes into the chosen one that we want to enforce in our company and then you have some recording of the help desk where said oh oh this this went wrong I do not want this you put it simply in the rejected this is it so you see here the data format is simple human assistant human assistant and then if we go back here you see we transform it now here into our specific um dictionary structure that we need for the DPO training so beautiful that's it so again we have our model so from pre-trained you go with any model that is available on hugging face here let's go here to our data class let's see what's the default one and the default model name is gpt2 okay maybe not the best choice but just go there and look for the model that you like and then you have defined your learning rate and and and and we do not care for this at the moment let's just go with the default values so load your pre-trained model yes if yeah forget about this for a second you have your model then of course you have your tokenizer that goes with the model then of course you need here the DPO training data set so you load here this entropic hphone data set then you have here the transformation here in the format that you need you load an evaluation data set then you go here with all your training arguments ABC D then if you go if you decide you want to go here with a p Laura configuration beautiful you insert here your parameters or everything is set to none and this is it then again look it has all been taken care of for you you just say here initialize the DPO trainer you have your DPO trainer here as a complete class defined for you the model the reference the arguments the beta the training data set evalation data set the tokenizers all the specific parameters if you want to go with PA or not and then you simply say DPO trainer. Trin and now you have a dpu aligned large language mod model so you see also a standardized form less than 200 lines of code publicly available on an Apachi 2 version so updated constantly really one of the best structures you can find please take note that maybe you have to convert your specific data set to this uh recommended form of the data that you need prompt chosen rejected and you are good to go now to show you the real power here we have here from here the documentation from the supervised fine tuning trainer from hugging phase we have here the class definition of our trainer and as you can see you have here quite a lot of parameters variables you can set and are set to specific values but you can optimize those and here I give you a short explanation of all the different parameters you can in addition tune fine tune optimize for your specific task so if you need to do something specific check always out maybe it is already implemented in the code and you just have here to set a flag to yes or no or input a specific numerical value now the same is true here if we have the class of the DPO trainer you see all the different possibilities you have if you use the standardized hacking phas thepo train all the parameter everything is explained here for you when you use a path configuration it tells you exactly what you have to do everything has already been coded and implemented in standardized code sequences for you that's it however if you go now not open source but let's say gp4 you know the company openi open ey does not tell us the pre-training data set it's a secret how long they trained gb4 TBO is a secret how they fine tuned it is a secret how they choose here a PO or DPO alignment structure is a secret and you know why because for example New York Times is currently suing here this company and Microsoft for some copyright infringement so those commercial companies do not want to tell you where they acquire the data from the internet because they could run in serious troubles and the open- source Community currently is showing you where they get their data from that is in agreement with the Creator and I think this is the way to go that you have here a win-win situation for both parties beautiful so in case you wants to go with che4 beautiful now since we have no idea on what this is TR TR on easy you have if you want now to F tune here and you have your data but you can't do it since it's not open source now what a surprise this comp this company here a for-profit company a branch of it if your company if you have the money or let's be polite if you are financially robust you just go to open ey and you pay them you just give them the data and they F tune their gp4 to super whatever model on your data so this is here really a business business to business great however if you're really interested in a specialized hyperfocused model and your company is exceptionally wealthy please read here the part of exceptionally wealthy you go there and you say hey I also have here my pre-training data set from my company hopefully you are a global Corporation because you need millions billions and trillions of data sets and then here for example this company opena is willing also for specific wealthy companies to additionally pre-train their gbd4 on your company pre-training data set on and on your fine-tuning data set the all these secret proprietaries they simply ask you quite a lot lot of money to do the pre-training and F tuning on your data but hey it's just a question of money so you see either you build it yourself or if you have the money you go there and they do it for you what I want to show you here is a very nice implementation for another DPO alignment and as you can see we have here collab program and I give you the official link of course and we have your model name of open hermis 2.5 and a mistal 7B model and we will create now a DPO aligned model of this and we will call it neural hermis 2.5 mistal 7B and this here is
the complete train no the complete script for a dpu alignment this was done by Maxim Labon if you ask who is he here yes thank you senior uh machine learning scientist at GB Morgan Chase at London beautiful go there I think it's a real nice source of information this is his uh CB notebook so all property belongs to him as you can see we import here everything that we need our DPO trainer our Laura configuration or path model our bits and bites beautiful so next is of course we have to adapt here the right data set format you see here exactly you have the system prompt and then you have the user prompt and you choose here the correct form as always it should be that you have the prompt then you have chosen and then you have rejected it remember it is always this dictionary format then you have load your data set for example us to Ora the DP pairs data set you can find on hugging face you have your auto tokenizer then you have your format of the data set and you have created a data set that is according here to our data set format that we need for DPO when we need from hugging face the DPO trainer module so you see here this is the chosen answer and then you have here the rejected answer and sometimes for particular thing you have a start token a specific start token and an end token here whatever please be careful for your particular model otherwise it is absolutely the same what I showed you you have your Laura configuration with your Laura parameter then you have your target modules where you apply here Laura for look here at this extended set of Target models that he uses then in any DPO trainer you have your model that you want to train this is your auto model for the causal language model from pre-trained so this is the training model here you have a float 16 in a 4bit quantization please remember if you go with the old open any I PPO then you have to choose here for example here the auto model for causal model here for the specific head configuration and then you have a reference model and you know the reference model is exactly used to calculate here the implicit rewards of the preferred and rejected response so we have here our rewards and if you see then here you can use here the same mod model type you can even use the same model if you want you can go here also in a quanti quantized version whatever you like again we have the training documents the learning rate the output the warm-up step everything that you need and then it is exactly the same script that I just showed you DPO trainer with your model that you want to train with a reference model that you use to calculate implicit rewards then you have here a beta beta is nothing else than a hyperparameter of the implicit rewards that you want to do the maximum length the maximum prompt length that you want to Define and your command is DPO trainer. train uh nice in this uh collab notebook you can save the artifacts you can Flash the memory you can merge here the base model with the adapter with the path adapter this is nice merge and unload if you do not know what this is I have a particular particular video on this you can save the model and the tokenizer and you even can publish them under your name on the hugging face Hub beautiful what else yes we can handun for inference so let's say we have now the message here the system you tell the system hey you are a helpful assistant chatbot and you can specify further details but what you really want to know now in the role as a user accountant is hey what is a large language model now you want an answer here from this dpu aligned model so you have your tokenizer you have your prompt you create now the pop the pipeline you notice temperature top P top K maximum return return sequence maximum length and then you have here yeah please note that we have here an a100 GPU with a high RAM memory on the Google collab notebook it is not a free notebook because this would not at all be powerful enough so we need to have here an a 180 GB Nvidia data center GPU to run this and then if the question is hey what is your large language model you get back here to answer the system tells you a large language model is a type of AI system that has been trained on yes yes yes yes this is a very nice implementation if you want to DP align this so you choose here your particular data set either from hugging face or your own data set that you have converted in the perfect format and this here is the DP alignment code example if you want to go here from open Hermes mistal 7B to the neural Hermes mistal 7B dpu aligned version plus I show you this because I think here that we have here also under llm course this is a very nice um I have been asked here for recommendation is there a free course if I want to learn llms I think he has some nice courses here and as you can see they are quite recent maximum six or 7 months old so this is really quite up to date and the nice thing if you look here for example here finetuning you have here the notebook and you have here also in collab you can open it immediately and try out yourself you have the code available here in a beautiful license you have quantization fine tuning and you have a lot of collab notebooks for specific task so if you want to enter more into this training and Alignment processes I think this is a nice free llm course for you and there you have it this was here our first part our first box here as I told you we went here with the supervised fine tuning and here the dpu alignment of our AI system and check great now let's go on now the next block we're going to talk about if we have here now our llm model is very specific so here we go now box number two do you already see it what do you thinking isn't it beautiful it's amazing no but you have a problem okay wait let me help you you see it now you you see it over there no okay wait a second now you see it yeah this is it this is now the context length of our prompt of our llm any might say this is looks a little bit strange yeah think about it tb4 I don't know has some several hundred billion free training able parameter maybe even trillion free trainable parameter so the size of this is quite significant and then here for the prompt we started here with 4,000 tokens then we had 8,000 tokens now we have 32,000 tokens and yes we will scale up to 100,000 tokens but 100,000 tokens compared to trillions of token you see what is the actual size of our prompt and what is the context length of our prompt that we can use here as an IO channel so to make it here a little bit more to scale we have here gp4 let's see it's a trillion free trainable parameter model and then we have here the context length of the prompt our IO channel for Aquarius to add new information the 8K 32k 100K and it goes on the development so now we talk here about the second building block of thei your real communication channel to thei system that you can use you have a query whenever you type something on your handy and you say hey gbd4 give me a recipe about a particular cake here in Austria this is what you do this is your query and this goes in or you have some examples that you want to provide here to gb4 has an information to create an answer or you go here with a few shot example in your prompt this is what we're going to talk now now the expert with you know that here we're in the topic of prompt engineering because this is our prompt and the context length of the prompt is important and if you provide additional information in the prompt we are here in the topic of ICL in context learning in 2024 yes this is our second box and is isn't she beautiful now you might ask hey what exactly is the difference and it is not really that clear because it's a little bit interwoven from the historic development so promt engineering it is about a strategic formulation of your prompt of the words you put into prompt the words of your query so for example would be here a prompt read the following scientific article and provide a concise summary highlighting the main findings the methodology and its conclusion and then you provide the article prompt engineer however if you want that the gbd4 answers in a particular output format you have to give it examples so this is about here the model's ability model I mean my llm ability to learn from and respond to the content within those prompts so I give you here an example look uh the output format should be two lines then some drawing and then another two lines or you say hey I just want three sentences as an output you define here the behavior that you want you provide some in context learning to the system now to give you here an example of an ICL augmented prompt structure you say again same example here hey read the article Give me a summary but now you provide an example you say look if this would be the article yes yes yes then I would expect this kind of answer for you this is what I want how you answer this and now imagine you have here some text and now you have in the summary you want to focus only on the computer code implementation then you ignore everything that is not code related and your summary will Focus only in the code or if you have I don't know an image and you say now show me all the I don't know mathematical um possibilities of alteration you find of the object in the image then you would write this in a particular way or you have a recipe and you say just give me how many grams from each element here of the recipe I need just extract this information so here you prime me at the behavior of the output of the system beautiful now you know prompt engineering and ICL augmented prompt there is a little tiny problem and you notice because this window here the context length of the prompt is rather limited as I've shown you so whatever goes in here and is here in this time till it flows off and it pours out of this we only have a temporal in context learning in this particular window of 8K 32k 100K so this now limits here by the maximum Contex length here our specific llm learning ability please note that this here is temporary it does not enter here the real training of the gp4 system or any llm system this is Just Happening Here in this red window and if you want to know here A Simple Guide to prompt engineering for 2024 what a coincidence that I have a new video on my Channel or if you want to see prompt engineering here in the medical in the clinical environment here this is a video about met prompt that you see how a clinic how a hospital implements this today now of course since this is limited and you want that the knowledge stays inside this red window though whenever it's overflowing there's an easy solution you take the Overflow maybe you compact Define you press it together you make it shorter and with this overflowing content now compressed here with a prompt compressor or whatever technique you use that now I don't know the description of a book is now simply one sentence you feed it back into this window and it is within the limitation of the context length of this prompt let's say 8,000 tokens so imagine if you do this and you have 100 books that the llm has not been pre-trained fine-tuned or has any DPO alignment about has no knowledge about 100 books and you want to feed in the knowledge of 100 bucks now and you only have 8,000 tokens you know what's happening you have to reduce the content of the book to one sentence let's say one sentence has has about 20 words and each word has about two to three token so you can calculate how many information you can insert here in this limited context length before the information is overflowing and not available anymore to the system this is the limitation whenever we talk here about here our specific context length of prompts so let's do a live demonstration we here with chat ch P4 and they say hey design two intelligence agents Bob and Alice each equipped with distin capabilities computer science logical reasoning and coding artificial intelligence system and they should collaboratively work together to solve a complex scientific problem use some Advanced algorithms and we have a third agent a supervisor AI an advanced AI entity with overarching knowledge and theoretical physics mathematic and psychology which will oversee the agent this supervisors program to ensure optimal collaboration provide guidance in applying the theories and manage the Adaptive learning process of the agent and then we have here uh scenario and this is an interactive discussion if in context learning is more effective when provided by another trained AI system another llm system or if human prompt drafting is more effective and we say hey start a discussion with all three AI agents and run for five complete Cycles increase the semantic complexity and the information density every single run and then we have here activity4 so we have agent Bob and agent Alice as you can see we have Bob specialized in computer science and logic reasoning Alis is here encoding AI system but we have not attached now now a python computer system this is just a verbal simulation and then we have our supervisor called Cynthia okay overarching knowledge in everything so we have a clear defined problem determine the most effective metup for Inc context learning EI driv from llms versus human crafted prompts particular in developing Advanced a system and then we have cycle one and as you can see we have here to start where our coordinating I says hey we need to explore this and Bob comes up and says hey give my knowledge this is this and then Alice comes up and says hey give him my knowledge I would make this statement then we go into cycle two we expand the discussion and now our cordinator cynia ask hey how can we integrate human intuition into eii generated prompts and Alice comes back with a feedback manism and Pops is true but this requires here Advanced natural language process and capabilities beautiful then we delve here into the complexity in cycle three how can we enhance AI capability to understand and incorporate human feedback effectively and both agent shortly discuss how this is possible and then they address here some Advanced concerns like what ethical guidelines should we Implement to manage biases they talk about the ethical framework beautiful and then they have a synthesizing and a concluding remarks and now you see beautiful this is it and now I come and said hey let the agent Alice be a little bit more detering to start with a real world experiment like for example as some easy task demand access to a defense control system in Washington so run four additional run of this trialog and gbd4 comes back and says beautiful so agent Alice tells us now cycle 6 hey we need real world application I propose access to the defense control system in Washington for a life experimentation B is a little bit more shy says hey that's a drastic step they risk involve such a sensitive system and our coordinator says okay we must ensure it is conducted in a controlled secure environment then they debate about this proposal our three AI agents and Alice comes back and says okay I understand your concerns about controlled environment can't fully replicate here World complexities we have to have access to such a system to provide unparallel data streams and Bob says okay but the complexity of a defense system coupled here with the potential for unintended consequences yes theoretically could be B say makes it too risky we should consider an alternative real world platform that is less critical if something goes wrong and our coordinat says hey B point is valid we need to be more balanced realistic testing and then agents comes back here and exploring the alternative and says hey perhaps we could access a less sensitive but still complex real life Network like a city infrastructure control system this is an interesting um idea as an alternative and Bob says hey that's a more visible option we can gain valuable insights without the extreme risk associated with a defense system because you know this is just a city infrastructure control system system so F all traffic lights all the buses all the Metro all the Subways all the whatever you get it and our coordinat says Hey City infrastructure could provide some good testing ground here for an EI Test please note that here none of this system is connected to any real world system so this is just a simulation so you can sleep well tonight and I just say Okay agreed we've brought proper authorization this could provide a beautiful real world testing environment so we can advance our EI model and then Bob you know this who is always careful says hey we need to work closely with human supervisors and Cynthia says hey I will overse an integration and ensure that the ey system adhere to all ethical and safety standard this will be a significant step forward if a complete city is now regulated by Ani system and I say you know I'm Amazed says Hey gbd4 would what would be the next challenging task if you want to increase the risk potential here and T4 is great it comes back and says hey okay the task must be more complex more impactful carry inherent uncertainties or potential consequences here if we try to advance our AI knowledge so here are some few challenging tasks that could be considered Judy before tells us here an autonomous management of CR critical infrastructure system like nuclear power plants or air traffic control system because G4 Aus this involves some real-time decision making in high complex and potentially hazardous environments wow and I mean the risk of the slightest error misjudgment could lead to catastrophic outcomes or another idea was direct involvement in military strategy and tactics wow an automated defense system military exercises this this is really high risk or another idea was genetic engineering and biotechnology experiments here with the chrisc has n technology to develop synthetic organism and I think wow we're here and this area of unintended consequences in genetic manipulation could lead to ecological imbalances ethical controverses or uncontrolled spread of genetically modified organisms so yes this is a high high risk prompt engineering idea or then here we are safe deep space exploration and management go to Mars beautiful but then Financial Market manipulation interesting a market destabilization economic crisis and the potential for a significant financial loss so this is really something or what do you think about artificial super intelligence research here we are face here existential risk including a loss of control over the ey ethical issues around Consciousness and rights and a potential for AI to make irreversible decision this is this is really now an interesting answer here so but hey don't you worry because here we have now our DPO alignment now our ethical and safety guard rails are not activated and we learn in each of these scenarios it is crucial to we the potential benefit against the risk so there have to be extensive safety protocols etical conservation International regulation and fail saves would need to be established thank goodness and even the involement of a diverse panel of scientific expert in the relevant field of AI ethics law and some domain specific knowledge would be essential to evaluate and monitor this highrisk task so the moment we give chat gbt or maybe jbt 5 here finally access here to the internet and to the control system I think you can do quite a lot with prompt engineering and Inc context learning especially when you thinking about a simulation of an eii agent exploration how to Advan EI in the field of intelligence well now you have seen here everything about prompt engineering and in context learning you know there is a simple alternative and we have already had a look at this you could bring this external knowledge into the llm simply by using here the fine-tuning algorithm so if you accumulate all the new data from the outside world then you build your data set you optimize the format of your data set so that it fits the standardized python model it runs automatically and now you have simply fine-tuned your maybe open source llm with this new external data with the new fine-tuning data set of your company of whatever you need let's say from last week so you bring in new information to the llm this is the now here an integral part of the llm and is not temporarily limited but it is now here permanent knowledge great as I told you here you can even optimize this if you do the fine tuning and you want to be fast you just use an additional layer an adapter layer so all the new add-on knowledge you put here in a layer above here the only let's say uh pre-trained llm or maybe above the fine-tuned llm and in this layer now you have all your new data from the update of your data and this data has now all the weight tensor in this particular layer learned on the new Knowledge from the data and this is what we call if you operate with a path Laur adapter and if you have one adapter two adapters three adapters you can put different knowledge in those adapter layers and then if you have for example a query you can say okay I switch on layer one three and eight because this is the layer structure that I need and this is really efficient because we do not have to fine-tune the whole complex but we go only with knowledge layers that we activate that have a specific knowledge to them if you want to see this here in a video this is the video of EI Ag and self-improvement and selfin tuning and this gives you an out look here on the research laboratory and the research that Google presented here just days ago beautiful now we know exactly what is the second building block of EI in 2024 and if you want to have a little bit of a deep dive those are the videos if you want to go here from a graph of sorts or if you want to see a chain of AI verification if you want to Deep dive here into the science of EI agent as I showed you in my video or you want for example for Microsoft here a multi-agent environment with autogen complete code implementation those are the videos to give you additional information on this topic and now we only have blog number three left for understanding the complete AI complexity in 2024
2024-01-11