ML seminar: Joshgun Sirajzade: Deep Learning Technologies for Natural Language Understanding
Good morning. Everybody welcome to our machine learning seminar. Today will have a pleasure to have a presentation from, uh, the. Data from the. Uh, who had a talk on that topic that is already, uh, visible on the screen. Um, so George can please go ahead. The visual floor is yours. Thank you very much.
I thank you for introducing me. Um, yeah, I, I have been, uh, crosstalk, uh, in the fields of. Uh, natural language processing a, I. Uh, machine and deep learning for last couple of years. And in this talk, I am going to. Yeah, talk about the newest deep learning technologies in the natural language. Understanding how we call it these days.
So, let me give you an overview, um, about my talk. Um, so, uh. I will 1st demonstrate what is. Very famous these days. Uh, this is, uh. Uh, and I'm not going to yeah. To to chat with it but, uh, I will just, um. So, what technology is behind it.
And then I will give an overview over the sub fields. Uh, which are used in order to create such a tool. And then we can go more into the details into the deep learning, uh, to give some mathematical um.
Intuition for it and then we will dive into the world. Of, uh, yeah, or natural language understanding, as we say it these days, uh, into the text representations and how we actually go from vectors to sentences. So, let me jump in, uh, straight so check impressed. Everyone. So, and, uh, so let's look WH, what is, uh, the technology behind it. And, uh, it's actually not that it's.
Easy of course, on 1 hand, but it's also not that it. Easier so, as you see, uh, uh, open a, I. Uh, shows many steps, um, in training of, uh, so let's look at the 1st. Um, so, um, data is collected.
Right. And then. A laborer actually a labor is it's a human being. Uh, who, uh, demonstrates the desired output. Behavior right. Uh, and. Then this data is used to fine tune.
Uh, uh, 3.5. Right. So, um, and then, um, there is another, uh, model, which is called. A reward model then, uh, the laborer again. Ranks the outputs, uh, from best to wars. Right. And then again. Um, this report model is.
Um, is used, uh, is trained actually. So, and, um, at the end, uh, we have, um. Optimizing, uh, yeah, strategy so, and check actually or. Open air people call it reinforcement learning, but, uh, yeah, we, we know it also as. Universal learning, or also from, uh, yeah, computer vision from Ganz. So, this is, um, actually there, uh, 2 models. A little bit compete against each other.
So, we have the, the chat, uh, mobile and the reward model actually. So this is what's happening. It's, it's a complicated set up. Uh, but, um, yeah, the, the, the most important thing in is that it proved everyone that a meaningful. What can be realized? Uh, so this is the. The main point here. So, uh, let me. Then give you an overview about the over the.
Field, so, uh, for many, uh, people, uh, it. Procured out of the nothing, but for us and our group or our, um. A machine learning seminar, of course, consist of people who are in this field in deep learning and, you know. That it is a result of long term. Yeah, long term developments in in, in in the natural language processing and deep learning. So, um, let me give you some, uh, intuition. So, 1, part of it, uh, it comes.
Uh, so this graph what I'm showing to, you. Needs to be, uh, read from yeah, from top. Probably top red left 2 till till the bone. So, uh, on 1 hand, we have the knowledge is from linguistics. And just go back, um, to, uh. To to, to to, uh, think what we call language model.
Or, uh, classically like, in the 80 S90 s, um, hidden markup. Uh, mark off models, so, Mark was a Russian, uh. Mathematician, actually, who discovered that, uh, the characters in a text. They followed to, um, to a certain.
Pattern yeah, so actually, uh, in in 1 language you can, um. Actually calculate the probability of the next. Character, um, occurring, uh, in, in, in, in a word.
Um, so, and after that, uh, like I said, in the 80 S90 s, uh, an algorithm was developed, which is called hidden markup. Uh, models, which is actually a Bayesian statistics or predictive model and, uh. And it has hidden States would be, uh, later on, in deep learning, also call, uh, hidden layers. So, there are some similarities so, on the other hand, in order to do, um. Yeah, um, natural language processing or.
To create a chat bot, like this, um, much of the knowledge comes from information retrieval. So, in in the information retrieval, um. When the yeah, the.
Documents were indexed. Um, a strategy was created, which is called term. Uh, document frequencies what we call ideas.
And this, uh, is actually representing a. Inside of a document, so I will give you later on also an example. So it's actually very powerful. Uh. Thing to do, which automatically, uh.
Yeah, can, um, uh, cluster or gives you, you know, like the. The data set that numerical representation for clustering the documents on 1 hand, or the words uh, on the other hand. Um, so and this lead led to the space. Model, and then there are some other applications like a latent semantic analysis. Um, which actually can, uh, create even topics.
Um, from a doc documents, so, as we see in this field. Um, there are many. This is the, the, the same coin, but it has many sites. So we can do with the same technology. Uh, many things. That's why actually, uh, with, uh, emergence of, uh, deep learning technologies in in. Um, the, the, the researchers, so.
A unification process, so that's why it's called. Today, natural language understanding, uh, because it it goes beyond, uh, natural language processing. Right and in in in this middle bubble, a, this is kind of. His history, it's not strictly historical, but he a kind of the technologists they depend on each other. So where, um. Yeah, deep learning waves wave is, uh, more or less, uh, depicted of course logistic regression also goes back to statistics and was there.
Uh, right then, uh, we had like, deep learning, uh, phase. Uh, then in the again, in the nineties, in the beginning of 2 thousand's, uh, long, short term memory was developed, which can handle text data, uh, better, uh, then in 2013 to back algorithm was published, which actually kind of. actually kind of Combines the shallow, a shallow network, which is. Basically regression there's a, there's a notion of a vector space model. So, that's why it's it's actually the help of, um. Yeah, of a regression or of a shallow network. It creates actually vectors for words.
Yeah, and, uh, what we don't, uh, or what would be, uh. Cannot forget it, uh, was also done for speech. So, it's also, uh, there's also a unification. So now we are talking. So, we, um, yeah, uh, communicate with the chatbot through, uh, text prompt or through a. Terminal, but in the future. Or, of course, at the same time, you see it, we will also be able to communicate, uh, visit, uh.
Acoustically verbally, like, just talking like I'm talking to you, so it's like, uh, it's also there and in order to, uh, yeah to assist this process, um, algorithms, like. Love to Vic, or, uh, were created, um, and this, uh, make from acoustic data. Vectors actually, so, and there are some other algorithms.
Or a technologies, um, like. Sec sequence to sequence, um, which, um. Later on also, um, used, um. What we call attention model and this, um, uh, attention model, uh, was used in, uh, what we call. Transformers and it's culminated in in a very.
Famous language model, which is called bird. Or, uh, depending on how you want to call it and bird is yeah. Is. By directional Transformer which transforms. Language, I will talk about also about it a, or a text. And, yeah, and this, all, uh, you know, like the also like the.
The theory of artificial intelligence chatbot chatbots or the notion of chatbots was also there, like, uh, till. Actually, from the beginning of 60 s56 days um, but it was rule based it was a very primitive and very. Easy to to confuse, you know. And it felt very mechanic. Yeah, but after this, um. Technologists emerged and it was, um.
Yeah, uh, kind of unified, uh, now, as we see in in the case of check, it's, um, yeah, very. Uh, almost very human, like, of course, not. Absolutely but, yeah, it's, um. It answers questions, it solves problems. It creates called snippets for you. So it's a very powerful. A machine, and you see on the left side again. You know, like this, uh, doesn't end with a chatbot. Um, it, uh, also goes.
Beyond it and all combines visit or brings visit other components. Uh, which is, uh, text mining, uh, topic, modeling, which is also sentiment analysis. So, the the field is growing and I, I'm sure. You can feel now um, yeah, the whole, um. Uh, masters or bachelors uh, only this natural language understanding, only only business technologies.
Uh, because this field, uh, is growing so fast. And, uh, yeah, and also, like. These fields, uh, has an impact on how a chat bot. Behaves? Yeah, uh, because the research from.
Let's say sentiment analysis, uh, it can also influence the chatbot. So the chat bot can, uh. Yeah, give you certain feelings or certain sentiments. So it's not that. Mechanic it used to be so, and you see why actually where this. Yeah, this is where this knowledge actually coming from.
From the research, so, uh, yeah, let me just, uh. Show you some, uh, again, some subfields, um, which are. Also their own topics applications, you can call it as you wish there's some selected.
These are also selected publications. I think that the, the, the. The publication, uh, in this field, uh, goes so fast. It's, uh, you cannot catch up. Yeah, but, uh, you can imagine again uh, there is a field, which is. Called machine translation and much of a of algorithms come comes actually from machine translation, uh, also check segments, the segments or all the encoders how we call them. Uh, there is a field, which is called crash question and answering.
Um, there is, um, a field which deals with conversations this again goes back to a little bit to, uh, yeah, linguistics. To, um, to to also, like, to social sciences a little bit how conversations are made. Um, and, uh, they are also, uh, classical. Fields inside of computer science, like, uh. Right. Uh, how do you pass, uh, the text? Uh, actually the best uh, and, of course, uh, as I mentioned before, uh, we are not only talking to a.
Bot through a prompt, uh, you know, like, uh, um, the in the future, it will be possible to. And not in the future, but now actually, it's almost possible to talk to it. Verbally acoustically. So, this is speech processing. Uh, so, uh, and, uh, speech processing also was there. For a long time, it was its own field and it's kind of, uh, now taking part in this. Unification process yeah. Um. So, and you have, of course, uh, many other fields, um, like, uh, let's.
A part of speech tagging with, which is very specific for, uh, natural language processing. Um, uh, or named entity recognition. So, and, uh, I mean, it's like, probably very interesting to end all these applications filled.
Fields with summarization at the end. Uh, yeah, but yeah, the, the. There there are many fields of text mining, um, and 1 of them would be summarization text. Summarization, which is also. Uh, an interesting, uh, field. Yeah, and this the the. The publications you see at the references, they are just, um.
Some picks, um, and the field is growing very fast actually. So, let me, uh, again, go back and just, um, yeah, just a. Think about what what is, uh, machine learning.
And what is the aim? Why the are we applying the machine learning deep learning in into this fields? So, uh, let's imagine we have. A demo app and this app. So, let's say it's like, very, very simple chatbot and you just log in and it, uh, tells you good morning or good afternoon. So, what you can do, and what was done in the chat bots uh. Like, maybe 20, 30 years ago, it was a hard coded. Actually, so, and in our, uh, example.
Um, yeah, it was just assumed, okay until, uh, 12 o'clock, uh, until, uh, you know, like afternoon, um. You, uh, all the app can greet the people with good morning. And, uh, as soon as the time goes, uh, higher or upper than, uh, 12 o'clock.
Then, uh, you greet them with a good day or or good evening and maybe after, like, uh, 18. Or, um, I don't know, like, 1718 or 19 o'clock. You say good evening uh. So this would be the it's very easy thing to do off, uh, like to hard coded. But, um, of course.
Uh, it may depend on the culture. How are you? Um, great people. It may depend on the enterprise. So the best thing is. To go and to learn it from data actually, how are people doing it? And, of course, in many things, how people answer questions, uh. In many things, it's a good idea to get this information from data and not, uh, hard coded at all. Yeah, but you see here, um. The whole, uh, it's, um, actually.
Emerges into the training, uh, data so you see it above. This is our training data. And into the learning process into the learning algorithm. And it's actually if you would imagine it in.
2 dimensional space, it's just a cutting line you could say. Okay. Uh, the last. Person who said good morning. In our case, it's 11, um, 6. Right in our data set. So till 11 6 yeah. There you you put a line, you know, like you separate. Say, good morning, but, uh, this is a small data set, right? And human beings create, uh, all kinds of. Things and it could be that that there is someone who told.
Said, good morning in in the afternoon actually in 14. O'clock or, you know, like 15 yeah. So your algorithm. You know, like just to make a linear separation is. That easy in in deep learning. So that's why I also like. Deep learning algorithms in the beginning. Uh, they, um.
Also had a mechanical feel to them, uh, which we sold, of course, with, um, with deep learning, there's no non linear separation, nonlinear activation functions. Um, um, or if you if it's like there's, um. Uh, if it's, um, complex, um. And has many dimensions then it's a yeah, a non linear hydroplane actually.
Let's put it this way. So, um. Yeah, and another thing is like in this good morning, uh, application we have, uh, also. Kind of very well, um, understandable time stamp. But, not everything in the life is, um. Yeah, is actually a number so also another question which accuse how you convert.
Things into numbers, especially in our case, how we, how you. Convert the, the sound waves or the written text, uh, or the type text into the numbers. How how you present them best. So this is, um, this is also 1 of the questions and another thing I would like to show you actually, in, in, in. In my opinion, in deep learning research, there is a big distinction between, um, between supervised learning, unsupervised learning. Principal enforcement learning, uh. Which is good, but I think at the end, they all.
The same and they go back the same. So, in our case, it doesn't matter if, um. If this, uh, application, if our training data, if it's. Let's say historic data, it's if it's happened, you know, like, 20 years ago or if someone is sitting there. And, uh, you know, like, uh, training this app. So, app says good morning and the, the person says. Or the labor says, no, it's not it's actually evening or something like that. And the, the algorithm is. Uh, you know, like, train it again. So, um.
It actually again, for me, it's, um. Different size, uh, sites of the same. Coin, um, so basically, at the end, everything is a data.
Data feedback, so interesting is when, uh, you have. Uh, um, adversarial training. Uh, when you train 2 to networks playing, uh, let's say against each other. So this is, uh, of course, um. Yeah, a special keys and, um.
Yes, of course. Yeah. Uh, also works very fine in. Uh, in in some cases, um, and it's just easier it can, uh, be automated, uh, very fast. And, uh. Yeah, so these are some, uh, notions. So you all know, um. About artificial, Neuro networks I, uh, probably will, uh.
Keep it so what is important just to say. We have input layer. We have hidden layers. We have output layer. Yeah, but as I said, uh, which is most important, probably from mathematical point of view. Is, um, 2 things actually, these are 2 things so the activation function we use. Uh, and this activation function here it. Actually enables, um, non linear separation in in the dataset. So, and because we have, uh, yeah, many.
Uh, let's say many layers. And, uh, many, uh, input nodes so this would be the layers would be here. Um. Um, vertical like horizontal and the inputs would be, um. Um, um, vertical.
Um, if we can create a very. Granular imprint of the data, which is very important in. In a in natural language understanding, because.
Yeah, takes data texts consists of a lot of words, and it's a lot of information we can create a lot of sentences basically, in in the time of trump's. He would say, yeah, we can create. Uh, endless amount of sentences, so that's why deep learning performs so good. Actually. So, Bell, but, uh, this also comes to the downside. Um, the downside is that these models are very data hungry. And competition very intense. Uh, so that's why also we need. Um, computational power, super computers, and I will also talk about that. So which makes the life of yeah. Um, machine learning, uh, research is easier. So, uh.
Yeah, so. I will just scratch a little bit. Uh, uh. Speech recognition, uh, so, as I said, you can create, um. Yeah, representation from texts, but also you can, uh, create.
Representation from speech from, uh, from sound waves, which is called acoustic model and again, this is. Probably a a perfect example for language model. It's just a. Probabilistic model which assigns yeah. To to to to, uh.
Uttered word to us to the said, say, set. Or, to the set, uh, let's say, um, audio wave, uh, a word, which it might. So, but I, as you see, also, it needs, uh, a lot of hours of speech and it was historically done with it. Hidden mark of models, but these days we use also deep learning and there are also. Um, many ways of processing it, uh, you can process it um. With, uh, this would be 1.
Dimensional vector, uh, which is created with. Um, yeah, it was prep Pre processing, but you can also use, um. You know, like 2 dimensional representation, like, uh, in in computer vision. So, also you can actually use. Um, plain, uh, text. We are similar to computer vision. Uh, also, we, we will see about this.
So, uh, and let's go back to text. So, W, W, what is, uh, what do we need actually for birds? So, what we need is a numerical representation this means converting text into numbers. Yeah, and we can do it in many ways. Uh, we can count them, we can take their positions into account the context. But, um, yeah, uh, the most important thing is probably. Is, uh, how this numerical.
Value represent represents the work the. Best, so this is the crucial question and I think also I have a feeling in natural language understanding. Many people just go and apply all the algorithms. Um, without thinking too much, um, about. Where these numbers come from actually and how well, this numbers actually represent. The phenomenon, um, the catch, so the, the metrics and yeah, the understanding of the phenomenon is very. Very important and.
Of course, we create numerical representation. For for a bird, uh, but then we can content. Donate it together, uh, and, uh, make it to a sentence, or we can also build a whole representation for a document. So, it goes for all the all these levels.
So and, uh, yeah, there is a hype about check and I wanted to give you, um. Probably small, very funny example, which I enjoy to use and which is very interesting. So, um, um, so let me. Yeah, 1st show you this 1. Probably, because this 1 explains the best.
Uh, how you can the text actually uh, and how the text. Um, um, yeah, might, uh, be, uh, organized or chunked into meaningful representation. So we have, uh, here we have our. Todd corpus, so it has 4, um, text in it.
So, the 1st, 1 is about, uh, animals about pets, dogs, cats, bats and the 3rd, 1. And the 2nd, and the last 1, um. Is about, uh, algorithms or computer scientists just I invented it, you know, and if I, if I, if I run this code. And, uh, so it's, uh, I use a socket learn a library, which is, uh, very, um. Uh, very, um. Um, they are very, um.
Famous in machine learning. Um, so it's very easy to use. So you see, it's not. That much line of lines of code. So, this is what, um. A computer with not a computer, but the algorithm actually does. Maybe, how how well visible are the results actually can I. It doesn't look that. Well, let me, can you, can you tell us whether it is a Pre trained model? Or is this, uh, just a. To your model that you are trying, because not probably not not. Everybody's familiar with.
Those libraries. Oh, okay. So what I do here actually. So, what I do here is, I, I, I. I import these libraries, so, but it's not actually that important. I have this variable here, which is called corpus and I create, uh. Count right. And then actually, um. See, the actually, this 1 is what we are seeing here.
So actually, I yeah, I formatted. Uh, actually, here here is it. Here so, and here we have a beautiful layout and here we have a beautiful layout. So this is where it's. Uh, it's actually shown up, so, let me. This is basic counting.
So, nothing a special happens there. There's. There there's, uh, let's say there is, uh, here in this, this step. There's no mathematics behind it. So what the algorithm just does. It creates, uh, what we call a dictionary.
Which is just all words. In all documents so this is this 1. So this is our dictionary and it goes it's a little bit longer. And this is our. Uh, so documents, these are our documents and it just. Puts the count of a word, um.
When it occurs in 1 document. And if not it, it is, uh, 0, does it make sense? Hello. So, it comes to works. Yeah, but it is. But this 1 is a very powerful representation. Because you see, uh, similar words.
Because they accure in the same documents. They will have very similar vectors. Does it make sense? Yeah. If you see here, for example, cats. And that's and dogs. They have here in my toy example.
Exactly identical, uh, vectors. Why. Because they accure in the same document. Here my document being alive. Yeah. Yeah.
Does it make sense? Sorry? Identical compared to the whole, like the whole. English dictionary right? You mean. No, no, it's just the words here. Are the words from here? Yes, yes. I, I got that. I mean, the vectors are not clearly very similar and I'm asking if you meant that they are similar.
Compared to the whole language dictionary, English language. Dictionary, and if you create it from. Like, a bigger text? Yes, in our case. It's, uh, from, um, from what you created it. Right and this this algorithms they are what we call language independent.
So that you can apply it to any language. If you, if, if I would have fear German. Documents I got that I got that. You said these vectors for the, for these 4 documents are similar to each other because we're from 1 document right? Uh. The vectors for words. Yeah, yes you said that was for document 1 for that mean 2 for document 3 for document 4 are similar because they're from 1 document. They actually from 1 text 1 corpus.
A similar topic. No. You know what I mean? Like a vector is all of it.
And it's a number is scaler in it. Yes, yes, yes. So you didn't say that the directors are similar you didn't say that right? The vectors for birds are similar. Yeah. Which a cure in the same document.
Okay, thank you. Yeah, so if you know, like. Like, cats and Betts, they are similar because they accure here. In the same dogs cats and he. So that's why they get similar. So, this is the infusion behind it. Right, and this is, uh, yeah, um, what leads later on. To, uh, to this work to the algorithm.
Which I am going to present next, which actually created this, uh, deep learning wave of natural language understanding. And you can, so what I do here, actually. I, uh, commented out because I. Did it only once? So I have here a corpus, a small corpus. And I train this model with this corpus then I save it. Then I open it. This is just side information. So, and.
So, what it can do not let me run it. I, uh, find the run button, so. Oh, no, it's, uh, running the vector rising. Let me run it here. So, I am running it. And what what I'm asking this model is. I am giving it 3 words. This is T.
Coffee and car, right? And I asked the model which 1 does not match there. And it shows me, uh. That car is the car is the 1, which is. Not matching there. Right? This is the interesting thing, because. Uh, whole, you know, like, how this has no.
Hard coded rules the model it had only. Uh, corpus, right but. Uh, and it looks just into sentences, so it doesn't look into whole documents, but. Only sentences, but because it knows. That coffee and tea, they, they share similar words. You know, like similar sentences. Uh, so these words are more similar in vector space then the word cards.
Yeah, that does that make sense. Yeah, so, uh, and, uh, we can go even further. We can also. See, um, uh, most closed words in in the vector space. So, if we get the word president, it gives us words, like, presidency. Governor, which is actually almost a synonym chairman.
Right. Um, Chancellor, even. I mean, this is amazing right? So, um. Or, let's say city to city, it gives you town, uh, you know, suburbs, uh, downtown.
A, bronze also a suburb. Right. Or to copy it, it gives you sugar, uh, you know, cocoa. Uh, cotton because cotton is also grown, you know, like, so it's, um. It's amazing, it's amazing thing and to see how it works again. I have here, like a tall corpus. So, again with dogs and cats, because I love boxing.
So, um, and. You know what, let's see. We put, uh, around them, the similar words, like friend.
Yeah, then the model will will sync. Okay. Uh, dogs and cats, they're s, they're, they are similar because they are always used with similar words. Right this is the intuition behind it. It's just the, the language, the text, it has a certain structure and deep learning what it does it just goes and leverages. This this, this thing, right? So, let me go back to yeah. So, 2 demos, so this is why I shot this once because, um, this, um, algorithms are very, very easy to use. So, if you want to experiment, if you want a code building, a chatbot.
This might be maybe a hard task to do for for a scientist for researcher but, uh, with this algorithms is worth too vague. You can, you know, like, in a couple of lines code. You can, uh, yeah, uh, train. Um, vectors from text document and, uh, and identify semantical, uh, related, uh. So, they are similar in meaning, but not exactly synonyms or. Because we know it, because they appear close to each other in the vector space. So, they have just, uh, yeah, similar vectors.
And vectors are mathematical objects so, like. Like, this, it could be anything. Um, and mostly if you create. Vectors from text it doesn't yeah, if you just look to the number, it doesn't say you. That much so, it's, um, also because it's. Yeah, the data set is, uh, probably so big in in the case of of a text. So, but, uh, let's see.
Think about what we can do with it. Right. Uh, and the thing is. Um, uh, I am, I understand. Uh, um, do you hear me. Yes, yes, we can hear you. Yep. Thank you.
Okay, uh, so, um, what we can do visit. What happens so, if we have a vector, if we have a good numerical representation for 1 word. We can identify. You know, like similar sentences, even if we put.
Uh, you know, like, if you put, uh, put them together, so let's say, and this, this is a classical, um. Example from sentiment analysis, but it can be also apply to. The question answering, or even to a machine translation. So let's, uh, uh.
I think we have an example sentence, which is the president talk. The wrong decision, right? And let's say this, uh. Either we want to create an answer to this sentence, or we want to mark it as negative in, in a sense of sentiment analysis.
And because the model knows all the. Similar words to it, it can go ahead. And identify other formulations. So, if someone says this governor takes. A false decisions, which is probably not, um. Likely to be said, because it would be a false decision, but yeah let's let's imagine, uh, someone said like this or someone had a typo.
But the model is still very. You are very flexible and it, it understands. The, the meaning of, um, what was said, and not. The the words, so this is actually a cement semantical representation what we call a semantical representation. It's semantical meaning.
So, that's why you can ask, uh, the chatbot, uh. Who is the president of the United States, or you can ask who is the president of America? Who is the president? I don't know. Uh, yeah. Um. Yeah, I, I don't find any other, uh, synonyms, but yeah, yeah. So, the the model will still know. Um, that you are asking about the president of.
Of the United States, right? Or maybe you can even ask. You know, like, because the, the new models, they are also very contest context sensitive. You can also ask who runs the United States. Yeah, the model still will know. Okay you are asking about. About the president because of this so it's, uh. Probably very, very easy way to to think about it to what, what happens. What what is the front phenomenon actually. Right. So, uh, this said, let's again, dive in a little bit into the newest technologies.
So, we, what we have, uh, is, uh, sequence to sequence model network model. So, it uses embeddings. Uh, and with sparse features, so, and sparse features, this is also very confusing. This doesn't mean that the. Number of features is sparse.
It just means it's the spouse, uh, Matrix or a spouse vector. Um, and sparse means, it's, it has lots of. Zeros and just a couple of ones, because the sentence. Sentence can be short, right? So, and, uh, what we do, we create tense vectors, these features from these, uh, sparse features. So, and then we have, uh, recurrent neural networks. So we have, it could be it could be a transformer attention model, which, uh, actually. Create again from these features.
Tense features right. And at the end, what we have is a. A soft Max function, you know, it from deep learning. Uh, of course and I won't go into that much detail, but this software. Function actually creates discrete predictions, which would be in our case the words. Right. So, uh, here you see again, a very. Simple visualization for, um, yeah, uh, from birds to, um, features actually. Uh, uh, a graph and, uh.
The same, uh, happens. Uh, uh, 1 layer after so, uh. Here we have the weights, so it goes from features to to to base. And this is the, the famous, uh, function with input. Weights and buyers, uh, which is yeah, our model network mathematically formulated and yeah, this is what, uh, happens. In the language model, uh, at the end. So there is a software Max, uh, function, which, uh, runs over the whole matrix and, uh, creates a.
Uh, with the product is the product of the probabilities. Uh, probability for, um, for for. Um, 1, uh, answer, so, this is actually, uh, what happens um.
Um, technically behind and, uh. And I also, uh, often get questions. So what what deep learning algorithms or architectures are used? So, actually, it, it's. It's so wild. So it's, um, uh. Many, uh, chat bots, they use a bird, like, in in the case of. Check it it, it uses, uh, yeah, uh, models. Um, so, uh, and they, uh, they are again, uh.
Can be a based on Transformers attention. So, I'm actually at the end. Yeah, it's on on your hand also to choose, uh, the the desired architecture. Uh, and, uh, yeah, there are many now in the, in the science. And in in the industry, actually, and, uh. Yeah, just to give you, uh, last last.
1 of the last things to think about. So also what is interesting what is what comes from, uh, from, uh, actually a machine translation. Is this encoder decoder philosophy? Where, uh, we try to input, let's say here. 1 language then encode it.
And then the decoder, the codes, it and outputs and other language. But this could be also question answer. This could be, um, a long test. Text short takes, uh, in the case of summarization. So this architecture is also like, very. Yeah, very useful, uh. So, my conclusion my.
Summary is a API. A deep learning algorithms, they get better. Definitely. Um, on the other hand training data gets. Better, uh, with demonstration of tools. Like, we understand even we, uh, need more data. We need to know our data. We need to know the phenomenon.
Right. Uh, it it gets better and and, uh, on the same hand, computational power gets better. Right. We have supercomputers they are, of course, uh, limits to Silicon, but quantum computing is coming. You know, special architectures, uh, are coming for, um. Processing, uh, deep learning networks. Um, so, and of course, uh, we are, uh. You are witnessing great times where I is improving a lot and yeah. With this. I, thank you for listening and let's discuss.
Together, if you have questions, uh. And so thank you, Josh. Good. It was very interesting. Talk and, uh, we have a. A little time for discussion and maybe 1st, I will, I will, uh. Let's say read or recap the questions that appeared in the charts. During your your talk, so, uh, 1st, it is a request if you will be happy to share the slides.
We have everybody on our website and perhaps if you could also share those codes, if it is not if it is okay with, you. That will be very useful, um, data set that we can provide to the future. Uh, listeners, so so, 11 note, I actually need the references that you introduce 1, as references. This 1.
No, no, no. Yeah. That makes sense. Well, thank you. I'm going to go with this president. Yeah. So, what I want to say, also, I have a YouTube channel. And this code snippets they are on GitHub. So they are, uh, uh, only the online available. If you Google it, you will find it just Google. But, you know, just to make some add some metadata to the description of your presentation, it will be easier to link.
What you presented with, uh, your YouTube channel your codes. This happens, so, and so forth. I will send it to you yet. Yeah, and the, the quite interesting question from Laura is the, I think it is, it relates. Bias of data and, uh, why there was a chairman, not the chairwoman as, uh, 1 of the closest. Was the president.
I I think the answer is quite maybe obvious, but maybe you want to comment on the on that. Yeah, uh, what what is interesting also, we have, uh, at the University of Luxembourg, uh, in in a small group, working on. Yeah, on legal, Ah, aspects on bias aspects.
Of course, the dataset. It's absolutely, uh. This comes from my dataset because there there there is chairman more often in this data set. There are lots of. Biases and, um.
Deep learning community works. Uh, yeah, it works against that. Um, and tries to, um, yeah, to to improve it. And I, I, I hope both things, I hope 1 day, our society will improve so much.
That we produce data, you know, like data sets. Biases but on the other hand, it's still. No matter what you do, because these are algorithms. Yeah, uh, this just learns from what it sees. It can always be biased and we need to control. That's why we need still probably the. Human beings to to control and to yeah.
Eliminate the biases, but absolutely very, very right question. Very legitimate question you wanted to ask question please go ahead. We cannot hear you. Hello are you there.
Hear you. I am also not hearing, uh, am I lost. You can write in the chat yeah, maybe that will be the easiest. I can read it loudly. Uh, if you, if you're writing the chat.
So, in the meantime, I ask 1 question you said that I mean, I found it very interesting idea. I mean, uh, the 1 that. You have to choose the best numerical representation of data and with the vocal communication, you said that we are going to. 1 way would be to pick a window for, like, taking. Data sample right. And here's the thought, uh, do we have different. It, uh, it is all it is already, uh, like, working, uh. Technology right do we use different window frame for different languages? Um, yeah, this is a also if.
Uh, think a question or a topic, which was researched. Actually, what what is, what should this window be actually there? So and there are several solutions. They are, uh, let's say, um. Language dependent solutions so, let's say, let let me give you a an example.
By default, if we would use word algorithm. Uh, considered 5. Before the, and 5 words after. And so the, the window. He is actually for 1 occurrence of the term of the bird is.
Right. 5 to the left 5 to the right. So, and then there's also internally, um. Um, a possibility to, uh. This window dynamically, so kind of emulate the sentence. Because, uh, you know, like your work might.
Begin, you know, like, be beginning of a sentence and, uh. The last sentence, uh, does not have anything to do visit or a chapter or something like that. So, uh, you can, uh. Send it to be dynamically so it just only considers, um, inside of 1 sentence and plus, uh, yeah, there are some, uh. Ideas about, uh, different languages, because some languages. Um, yeah, they kind of, um.
Have grammatical markers, uh, bound to the word. So they say the same thing, uh, maybe there's 2 birds in, in the case of native languages, for example. And there are some other languages which, which are very grammatical information is actually. Um, expressed by, by, by, um, yeah, by, uh, particles by words, you know, like, by. Uh, Pre and post positions.
There may be yeah, uh, you need less number of words. So there is a research also on this 1. so this is, you know. Thank you. You're welcome. No, uh, I cannot see any of your comments here in the charts. Do you want to try to use your Michael? We still cannot hear you. So.
Maybe I will. Cool. Am I actually still in. In, um, yeah, I will stop. So, I can see you better. Yeah. Now I'm seeing you better. Yeah. Okay, so maybe, um. Well, I suggest that if you drove have a question to George.
You can ask him, uh, via email or meet him in person. Uh, later on, because, uh, the time is coming to the end and we, we should. To finish unless there is a short question to be asked by somebody. But, if not, I would like to thank you so much. George. Good for the very nice. Insight for presentation and to all for your attention and for the discussion.
Um, and I will invite, I'm I would like to invite you for the seminar next week. I will keep you posted about the topic and. And the link to the president of the meeting. So have a good day and and see you next. Thank you thank you. Thank you.