NW-NLP 2018: Ben Taskar Invited Talk; Learning and Reasoning about the World using Language
You. So. It's, my pleasure to introduce the agent joy who's an associate professor at, the University, of Washington and also a senior research manager, ai2 she's. Done a number of really, interesting research projects, over the year it's always fun to watch what you a gin has been doing including. Things about common sense knowledge or, language generation, and more recently about AI for social good and, she's, had a number of really an impressive, accolades, including, like the top ten to watch in 2015 from AI, Tripoli's AI and, recipient. Of the Marr prize in ICCB and also a member. Of the team that won the Alexa prize challenge in 2017, so lots of amazing things to walk watch for and. So. Alright. Speaking. Of Alex surprise challenge. Start. By sharing brief, experience. With this so the goal. Of this challenge was to create conversational. AI that can make, a coherent, and engaging conversation. With humans, and the, good news is that the. Team that I was part of thanks. To the amazing students, we, won the competition and, he felt it great to be the winner the. Bad news is that conversational. AI remains. To be more, or less unsolved. So especially. To. Our surprise, who were not, a surprising factor at all is that the. Winning recipe was, not based on neural. Networks with the brute force more data deeper. Networks, and, spikes. Spiked, with the reinforcement, learning that, just, didn't work so now that's curious, why, why would that be because we, we thought that. Neural. Networks do amazing things like superhuman, performance, on object recognition or, image captioning, a lot, of Industry, replacing. Their long-standing, statistical. Machine translation systems. With the neural melt models and then, of course a special recognition has, been working really, well and even. More recently human, level performance on, reading, comprehension Wow, and, all these are based on very. Large amount of training, data and then, sufficiently. Deep networks, so this. Should work. However. If, we look at closely there, are significant. Performance gaps, across. Different, types of tasks. So, first of all. Nobody. Is really reporting, on superhuman. Performance, on making, a conversation. Or summarizing, a document, or composing. An email on. Behalf of me or identifying. Fake news despite. The fact that we, have a lot of data for, all of this or if not me at least companies, do have a lot of data for any of this and they, do have a lot of GPUs, too so, why is it that we don't hear. About this from the news and, even. For those, applications. For which neural, networks, do. Perform, at superhuman, level for, some. Datasets. They. Usually, are never robust, if given unfamiliar. Out of the domain or adversarial. Example, as Yonatan, was talking about this, morning so. What's. Going on here I'm, going to argue. That in. Fact that's because they are very. Fundamentally. Two different types of applications. Out there and we. Are primarily, seeing, the advancements. For type one type tasks, where shallow, understanding, can get you quite far so an. Example, is translating. Bananas are green in English sentence into a Polish sentence, banana song Jelani for, this neural, networks are very good at learning word, level or phrase level pattern matching, so that it can learn to translate, from. One language to the other without really understand. Much about a, particular, language, whereas. If I were to make a conversation. And tell you bananas, are green you might tell me no. They're not or you might tell me they're, not ripe and if. We try to learn the, mapping function between. Input-output. Pairings, this way, there's. Only a very weak alignment, between the input and output and in fact we need some, sort of abstraction, cognition. Or reasoning. In, order to really, know, what to say next and so. It which, in turn requires knowledge, especially common. Sense knowledge, and, ignoring, this factor, all together and try to learn this mapping, function, just doesn't make a lot of a sense so. To. Make. Another point about how important. It is to be able to read between the lines and. Let's. Think about news headline cheeseburger. Stabbing if you give this to state of the art parser, today it's going to tell you that cheeseburger, is a noun, modifying, another. Word the stabbing which may be a noun or verb depending, on which parser, you use but, it doesn't matter the, fact that one word modifying, the other doesn't, tell us whether. It means someone stepped a cheeseburger, or a cheeseburger, step, two salmon or a cheeseburger, step to another cheeseburger, and. So forth and you, know what, may have been the actual meaning of this news, headline. Because. You're able to fill. In this missing information, from the sentence so that there are probably two people involved in this act, even. Though they were not even mentioned, in the original news, headline, and when.
We Do do that it's, the common sense knowledge that we rely on especially. Physical, common sense is that it's not possible to, step somebody, you using, a cheeseburger because it's too soft and. Standing. Someone is not good. And, it's, immoral, so, it's, more likely to be newsworthy. Whereas. If you step a cheeseburger, who cares. So. When. We look at and think, about different types of knowledge in fact there has been tremendous amount. Of research, if we're learning, and extracting. Encyclopedic. Knowledge like, who is the president, of which country, and born it what year but not knowing this I can make a fairly okay conversation, however. It's. Really common sense knowledge that. We need in order to make an okay conversation so. These. Novel. Physics technology were, social, norms have not been studied very much and if, you look up a CL, anthology. So this is as, we all know the, repository, of NLP papers, most. Papers, that dare. To mention the world common sense are either from 80s who were from the past few years so, nothing, in between complete. Void. And I. Think. It's imperfect, color so there was tremendous, failure, with the common sense in 80s and so. But. Recently, simply that people start, to forgetting about the path to failure. So. There's. Been I feel, very interesting, datasets one of which is from Microsoft test whether Rock story common sense dataset and so. For, all, of these tasks, basically, just, grow force more, data larger. Networks, don't really. Go very far and we. Somehow need a different, game plan so. To be honest I don't really know what what there should be but for. Lack of any better idea I'm going to talk about two different directions, a. Thinking. About neural network architectures. They can better. Model, the latent process, in our mind when, understanding and reasoning, about. Text. Abstracting. Away from, the surface patterns, and then. Simultaneously. We. Might want to think more about, representation. Formalisms. Orthogonal. To neural networks, and really think about how can we possibly, organize. And, learn, common-sense knowledge using, language, as a scaffoldings.
Now. Usually. In. This spectrum. I talk about these three pieces, shown. Here but the, second one will be. Described. By Antoine today so I'm. Going to only give, you. Teaser. For that and instead talk about dynamic, entities, in your own language, models, so let's. Begin with the first part, so. Neural, checklist, model, was. Originally. Designed to think about. Particular. Kind of language, generation, challenge. In particular, there. Are. Three. Different types of a generation. Setting case one is a small input small output like, machine translation, case. Two is a big input small, output like document, isomerization, and these, are two cases have you been studied a lot more in. In the past, whereas, case three is the case when there, may be only very sketchy, input, small input and then suddenly we have to generate. The bigger output and for, this there, are two unique challenges. A there's. An information gap, so, the machine, has to have a creative power to fill in a lot of information, that was not even quite, in the input and then, the. Second challenge is that the moment you start generating, more than one sentence, it's very, very easy to make. Problems. So, coherence. Becomes a big challenge as well so, a, little, bit more formally, the task that we are going to think about momentarily. Is, that there's going to be an input which is the title and on agenda, which is a list of items that, you want to talk about and, then, the output will be a paragraph. Or a, document. With the multiple sentences, that, tries. To achieve, the goal employed, by the title, and then try to make. Use of only. And all of the agenda, items, and for. Now we are going to use a recipe, generation, as a running, task running. Example, but later, the model will be applied to this. Dialogue. Response, generation, as well so, at, the, beginning, we thought that. In. Order to generate recipe. We should first parse, it because, that's. What a lot of people do in, an LP once. We parse it we are going to abstract, away some. Prototypical. Graph structure, out of it from. Which perhaps we can sample, a graph which, is like planning, and then, convert, to that graph down to raw text so there, was our original, plan and we even worked on the first two segments and managed. To write a paper about parsing. Recipes, into action, graphs but, then it realized we realized that, doing. This entire salt call is quite hard because errors. Started, propagating. It's. Really it was really unclear what to do with this I'm still wanting.
To Do this but in. The meanwhile we were wondering. Whether we could somehow simplify, the whole process, and that's, when we heard, about encoder, decoder architecture. In neural network literature, which apparently, were, creative. A lot of a task like. Machine translation, so let's. Encode, the recipe, title into a vector and then decode the entire recipe out and see, what happens. Sausage. Sandwiches, so. This. Is what happens. All. Right, I picked. On. Purpose worst example, however recurrent. In neural networks tend. To repeat itself a lot like a sandwich a sandwich eventually, it over, overcomes. The sandwich overdose, but still things. Must still be repeated that listed twice, so. Why does this not work very well for us when it does really, work really well for a lot of other tasks well, it, doesn't work well when there's. Only a very weak alignment, between the input and the output that's what I was trying. To tell you earlier about different. Input-output. Comparisons. In this particular, task only 6 to 10% of words in the output came from the input which means, 90. To 94, percent of the time Apple. Came out of nothing almost it, looks almost that way for the neural networks or encoder/decoder, architecture. So the. Rest has. To come from somewhere else and you know network the. The, off-the-shelf models, don't operate very well so, with. This in mind the. High-level idea is that is it possible. To compose, a neural, network architecture that. Could sort, of have this mental. Map. Of different. Items. That you, might want to check. As you go so, that's what we did with the neural checklist, model, but, since this one is a little bit hard to stare, at let's look at much, a simplified. Version, as a fellow so, we're going to make garlic, tomato salsa based. On this, visualization. Where. We encode the title into vector, there's, a checklist that, has lists. Of agenda, items and, then, at. Each time step this, is. One. Unit of some, recurrent. Neural network, in our case it's a GRU but that particular choice it doesn't matter, so, in. Journalist went to make this a three-way. Decision. Whether I am about to generate, a word that is. Non. Ingredient. Word were new ingredient, word who were none, of the above so. Let's. See how it works so let us say so far we generally the chapter. Internally. The model was going to look at its own context, and decided, at all it's time to use, one of the new items so. It's, going to do softmax. Over all the new. Items and then let's say Tomatoes got, the highest score so, we check that off from, the checklist, and move, forth, so. The second sentence might start with the dice though and at, this point the, network might. Internally. Decide that oh let's use another unused. Item, and. Softmax. Will do, the. Probability. Projection, only over the unused. Apart and let's say onions, were the second, best choice and, check. That off from the checklist and move forward now, something interesting is about to happen so add to it. At this point, contextually. It's the moment when you go, back to one of the already introduced. Items, that's just how the discourse. Queues are in, the, indicating. To. Us and that's also what the network picks up as well so let's you. What. Should we use between. Tomatoes, and onions. Tomatoes. Because, that's. What was introduced, earlier, on and the. Discourse, convention, is that that's what we do mention and the network learns to do that so, in. Fact all this checklist is. Probabilistic. And were this, it's not black, and white so it's based. On the attend. Simulation. Of attention, scores, which. Reflects. How much the model thinks, that it made use of a particular ingredient. And then, the, three-way classification. Is also, soft in the sense it's an interpolation. Of a three different language. Model like components. Where the, first one is generic, language model, like. Ordinance. Or jurors oral STM's whereas the other two are, based on attention mechanisms, looking, at either use the portion or on use the portion of the, agenda item so. We. Did comparisons. Automatically. And. Of. Course the stories that we do better than other baselines, but. The. Machine, tray so we borrowed machine, translation, measures and when, you'll just look at this number looks really, very low. So. You, probably want to see actual. Samples, so I'm going to show you that but briefly, this one shows what, fraction, of agenda, items, the, model. Is making use of and the. Regular. Encoder/decoder. Architecture. With. Or without attentions, practically.
Ignores, What's given in the input because. They. Are not very when. The input/output alignment, is so weak they're not very good at actually learning, the mapping so mostly. They are ignored, whereas checklist can make use of much, bigger portion so. Example. Skillet, chicken rice. Baseline. That's, regular, arenas in a larger scale skillet, brown chicken in oil add chicken and rice cook, and more, rice and, broth and cook some, more and so forth so. Starting. More rice is starting more rice keep, cooking and. Probably. To edible, safe to eat in. The end but somehow, there's, not much of a coherence. So checklist. On the other hand there's, much better job so in, a largest skillet hit rice and onion add carrots and mushrooms cook, stir constantly until, hot and bubbly stir in seasonings, cook some more stirring. Chicken and then cook some more and then serve over rice so, overall. The, coherence. Suddenly, became much better even, though in, this model we are not even doing any fancy modeling, of discourse, in any measure, but, it was only this checklist, that. Is external memory structure, that, allows. Recurrent. Neural networks, to better, manage, what's, happening, in the long-term context, and that's going to be sort of a repeating, mess in the first half of this talk so, works. Really well when the, input is familiar, and apparently. Skillet. Chicken rice, is quite, common. Recipe, in the data set so. And. This is a human recipe, it's, probably. A little bit better so. Let's. Look at the, failure case chocolate-covered. The potato chips now. Never. Heard of this before. And baseline. Preheat. Over probably. Good idea but then grease flour, a pen start. To baking it right away. So. Because. Input is unfamiliar, it doesn't know what to do with this just, ignore it in. The case of chocolate it's paying more attention it, also does weird the thing like suddenly, bake something right away but. Then you'd realized oh I haven't done anything with chocolate, so let's melt it and, then it says silly. Things like add the potato mixture to the potato mixture which is grammatically. Correct, it's just a little bit silly. Eventually. It's going to fried that in a hot oil and I don't think it's a good idea to fry something covered, in melt to the chocolate. So. That, says something about AI safety. And. I'm. Not joking, here so. The. Fundamental. Problem with, the neural networks, today despite, all these amazing results is that they struggle for unfamiliar. Input, and as a, result, you. Know we cannot really really really really trust them so human, recipe, is much better and I'm going to come back to this point in a bit but, let me very briefly mention. How, the. Same, model was able to. Achieve. Better, performance on a particular. Dialogue. Response, data, set as well so. Moving. On to the teaser. Of the neural, process networks. Which I'm going, to only give you motivation, for so. This, is another example, globally. Cool here oh sorry deep, fry the cauliflower, that, was the title, and. Then neural checklist model generate, the following wash. And, try the cauliflower, heat the oil in, the skillet and fry the sauce until they're golden brown drain, on paper towels add the sauce to the sauce and mix well serve hot or cold what's wrong with this. What's. A, cauliflower. Is. A steer it's a clean, but. Hasn't. Been cooked yet. So. Again you. Know I mean against, cauliflower, is okay to eat raw but so. When. I gave this talk about neural checklist, model at Harvard University. A student, asked me after the talk our arena is a month without a brain and I wasn't sure how to respond, to that I guess. I sort of agreed with her so, we. Need common, sense knowledge, to, reason about this sort of unfamiliar. Situations. And humans, can still do the right thing whereas, we. Don't know what's going what, machines will do you know networks will do so. That's. Sort of the motivation like, can we. Think. About, new. Architecture. That. Could read, between the lines and, reason about the unspoken, but obvious effects, like for example fright, off in the pan we, know just. Common sense wise we know that then, location, of tofu must be in the pen and temperature, of tofu must become hot even though the. Sentence, doesn't say any of this, so. In. A way it's similar to how we. Might be able to do mental simulation, up about a lot of things when we understand. The stories and understand, others, when. Making conversation, with them and in fact there have been some literature. In. Cognitive, science, or psychology that. That's. What humans seem to be doing exactly how he's, always debatable. But, we seem to be doing something like this and, so can, me then challenge. Ourself by.
Asking This question, instead of this question, so a lot, of NLP research, has been based on. Labeling. Let's, label every, word in a sentence with the syntactic and semantic categories. And that. Has been really, very useful, but. It focuses, by-and-large, on what is a set in sentence, can, we also think about in addition to that. Simulating. The causal effects that's obvious, but not, spoken and. Abstracting. Away from, the surface strings. And in fact there have been some recent efforts, of that flavor, but, of course when people writing. Papers, in, at the beginning no, paper will solve anything completely, so. We. Have another one coming up your process Network you. Can hear more about this, in a bit today but. I will leave it there and then talk about the, next component, dynamic. Entities, in neural language models, so. That's, work with. Other. Colleagues, that you don't do, always here, I swear. Today and so. The. Problem. With the neural networks, this, is another sample, that I found, from a paper that. Introduced. Lambada, data said I think so, human says what's your job motion says I'm a lawyer human says what do you do now motion, says I'm a doctor too. So. I mean, somewhat understandable, if, you think about how I, embedded. Here and I embedding here when it starts with the word embedding it's studying, like at the same word it's the same word and so. Jane. In Jane Eyre maybe the same as Jane and Mentalist, in terms. Of how they start with neural. Representation. And then when. I say she it to mean Jane now. Suddenly they kind of look like different words, in. Addition. To that there's this so philosophical, question of is. It possible to encode, very, large context, into one vector and the. Ray said no it's not possible I. Don't. Know maybe, some, of you in the audience who are very very young maybe have not known about this but he, said that and I'm going to just repeat, what he said by adding that noir into, a sequence of vectors it's just not enough, so. Recurrent. Recurrent. Language model has a sequence of vectors basically, and. Our. Model. Is to, add more, external. Memory that can have, a better long-term. Representation. As follows. So let us say this, award is some entity, and. Then. I'm going to create this a separate, memory chain which, is also recurrent. For, some time and. Whenever. I see a new entity mention, I'm going to create a new chain and they. Are going to be, just, copying themselves over, over time until. There's. A core efference, to the same blue, entity, at, which point I'm going to now, combine. The, context, coming from the recurrent, neural network, with whatever, I was a storing, here as. A separate, memory chain and. So forth so, that's. The. Gist of this model, that let's have a separate chain in, which we. Store. A little, bit more precise, information about, different. And dimensions. Corresponding. To the same entity and in. Doing so. We. Are now sort. Of having recurrent, entity, model in. Addition to the recurrent language model and, whenever. We. Need, to compute. New. Entity. Embedding in this. In. This chain we are going to interpolate, that between, whatever. Was coming from the neural. Network a language model part and then whatever, was stored, here and that, interpolation. Is based on gating, so that's. Fairly, standard thing to do and the. Gating. Can be parameterized in. Many different ways this. Is a particular, way of a parameterizing, it but, it doesn't have to be this, way so. Once. We have that then, what can we do with this well usually. When you. Sample. Words, out it goes in, sequence in. The language model part like this and when, we do that instead, of. Conditioning. On just, hidden, state of the recurrent, neural network, part now we can condition. Also on, whatever. Is relevant to entity, for. That part by doing so we just have a more precise, con, turning on the, context, that is more relevant than. How we do they will be impossible otherwise so, it's. Just in. Some sense I mean like feel for the neural network with the two layers you should be able to approximate anything, in theory but nobody can train it to do anything useful and that's, why in the computer vision community we, would be we will be in saying a lot of our new architectures, like Reggie.
Do Net dense net condensed net and so forth so, similarly, I think we need to do more, research like this this, may not be the best architecture. Yet but, there, are the kind of architecture that allows us to carry on long-term context more precisely, so. That's, sort of the intuition, I'm, going to give you just a very brief. Intuition. About what. We do for training, and testing so. We. Went with this idea of a generative. Model such. That for each word there are a bunch of. Bookkeeping. Variables. That we create such, as a boolean. Variable. That ask whether, currently. On a part. Of an entity or not and if, so which, entity, am, i, belong. To. Do. I belong to and then watch. The length and this there's this length of that decrease over time until it hits a zero it's. One way to do that there may be different, ways of doing it and finally. Core. Efference, relations. And this one just shows how, these, variables, might. Refer. To different information. So, during, training, we are going to in, this work we'll just assume that we have this perfect, correct, chain. That. Was given from the sky. That's probably. Unrealistic, assumption. To make but as a first step that's what we did and then. During, testing, of course we assumed. A little bit less kosher, to assume, that for testing so we. Just. Sample, and marginalize, them out using, importance, of sampling and. By. Doing this then what can we do we can we can then think about different, use cases so we, first of all looked, at. Just. Using, this as a language model that. May. Be able to. Marginalize. Out different, types of occurrence, decisions, and then in, that case it can lower the, perplexity. On the Conal test, set and then, in. The second case we also use. The dis in order to. Improve. A certain. Core. Efference resolution. System, that we could piggyback, on by, adding. Our number. Our scores. At the model scores as, additional. Scores to help with rankings so we, plugged. Our system, to ranking based system. And we were able to improve. The performance it's. Not state-of-the-art, because people, keep improving on this quite a bit and our colleague, at YouTube look has, much, stronger results, now and. Then finally, there. Was also very interesting new data set that was a, little. Bit common-sense inspired. And, it was about predicting.
What Entity, may get mentioned in extent for that task, also we were able to perform, much better than other strong, baselines, so. That's. Another. Story, that sort. Of repeats, this. Theme. Of or, spirit of having, new neural network architecture that. Might better represents. The latent, the process that goes in the, way text is written where. Text is understood. And then, now, I'm going to switch the gear quite a bit and talk about this verb physics which is more on this side. So. This. Work is sort of motivated. By this. Hope that in the future we may have a home robot that, should understand, a lot, about different. Household. Items. So. That it, can, interact. With them he, should have understanding, about relative, size weight, rigidity, strength, and so forth so. It's. Good to be aware that usually, people are larger. Than a chair and. If. I were to look this up on. Go. Nobody says this but, now I said that so you can search this but. When. I searched, it for the first time I couldn't search that, or any of this because, nobody says any of this it's. Trivially, true so. This is a known fact as the reporting bias that, people don't state the obvious, and then. When finally people, say something, it's, like exceptional. Case so we must now conclude that, horses, are similar in size to dogs so then, what do we do I thought, that the, answer must be in computer, vision, let's. Look at images so, we. Worked on this and one of the papers, looked, at lots of images, and. Estimated. Relative, size the differences, are, adjusted. Where the depth the difference is and then, we, were able to learn some, knowledge like dogs are usually bigger, than cats, but, the, takeaway from, this work was that I, thought, and now peas are very hard but computer, vision is also hard and in, addition to that computer. Vision just takes more computation. Because images are bigger to process so, after producing processing. Lots of images, I only, got like, hundreds. Of knowledge nuggets, which, is seemed kind of small and. Then. In addition to that, okay. Size, we can measure but, what about relative. Weights were, strength the differences, were speed differences. And these are, visual, to human eyes not. Really for computer vision yet, to. Some degree yes but not in a reliable, way that I could. Depend. On so our, revised. The pool and Dan is that well, I mean, one, could just wait but since, we don't have anything better to do let's, go back to language, but, with a different, plan, so. Key, insight, is, this. Even. Though nobody, says any of this people, do say that they. Throw, a pen or stone chair and so forth so all, of these are possible, when, the. Agent, is, typically. Bigger than, object. Heavier. Than the object as a result, the object, is moving faster, than the agent, temporarily. So. The representation, which, we named as a verb physics goes as a follows, for. It's. A very similar to semantic frames frame. Semantics, but it. Now, talks, a little bit more about pragmatic, meanings, of language, so for. Any pair of potential. Arguments, of a predicate, we, can think about what. May be implied so, we walked into the house may, imply that I, am, probably smaller, than my house and lighter. Than, my house and moves faster, than my house and, if. I squash, a bug with, my boots, probably. My, boot is. Next. With Y oh yeah, squashing, a bug with my boat and my the. Bug is smaller than my boat and lighter, than my boat and so forth so they are this, sort of a long, list of. Obviously. Likely. Truth. About. The world that comes to our my our mind, and perhaps, we can learn to detect that as well so. In, terms, of model. We. Decided, that perhaps. We can, solve. Two related the problems, is simultaneously. In particular. Physical. Properties, implied, by verbs, so, what kind of actions, to, imply. About different. Arguments. And then. Also in. General different, relative knowledge, in between. Objects, across. The five different attributes, in this work, although one, can extend, it to. Include other attributes, as well and if. You have been waiting for deep learning to pass, so. In this work we didn't use a deep learning per se although. We did make use of neural, embeddings.
But. Otherwise it's a factor graph with, lots. Of. Variables. That have probabilistic. Interpretation. So. It. Goes to something like that I'm going to give you a very sketch. Idea, of what we are doing so we, are going to throw in a bunch of the random variables, so that might mean that, for. Each of them it might mean something like P. Is either. Bigger, or smaller. Comparable. To. Q in terms of a size so they. Can have this a three different value. Assignment, and then. On the other side of the graph we can also throw, in a bunch of the random variables, that are in cause action. Implications. Again the, value can be there bigger or smaller or same, and then, there, can be about a particular. Predict. You like throw with, respect, to a particular attribute, like, size in, fact. This. Predict. Variable. Is a collection, of random, variables, internally. Due to the fact that, using. The same verb. Like throw we can use, in many different. Frames. So, we. Just use different random. Variables for, different, instance she, ation of any particular, predicate. But. For now we might just assume that they're sort of like behaving, similarly now. At, this point though there's. No evidence, about. Any of these random variables truth. Assignment, because, nobody. Says my house is bigger than me and nobody, says when I throw something I am usually bigger than the, object, no one says any of this so the. Only thing that we do observe from, language is how, people, use different. Arguments, together with the different actions, that for. Example I threw my bag as I walked into my house so people do say this and that's the. Relational. Evidence. Is the only evidence that we do have so. Sort. Of translating. This intuition. Into. The. Graph. Model. Is that what, we are throwing, in potential, function, that quantifies. The. Selection. Of preference, between a verb and its arguments. And then, similarly, a. Lot of potential. Functions. For object, obj2 similar. As. Well as verbal similarity. As well as a friend frame similarity, and then, finally, we. Cannot suddenly, decode. Knowledge. Out of nothing, so there's a little bit of asset knowledge. Given. As unary potential, function, so. That we. Can. Reason about the entire network and so. Far we've been only talking about sighs well we can keep doing this until, the network. Gets very very messy by, adding this weight part and other parts as well so. There's. This. Well-known. Inference, algorithm, known as loopy belief propagation that's. What we did for, making. The inference, and the. Conclusion. Very briefly, is that, the. Random assignment will. Be 1/3, accuracy. Majority. Baseline, will be about. 44. To 50 whereas. If we just. Use noodle embeddings, to. Reason. About different. Knowledge without. The entire, graph about just one node. Independently. From all the other then, that's, how, much we do and then having this graph, inference. Can, help. Improving. The performance further, and on. One side we are looking at frame predictions. On the other side we are looking at object, object knowledge. So. To. Summarize. We. Have. Explored, whether it's possible to reverse engineer novel.
Physics Knowledge from, language even though people. Don't speak, the, obvious effects, but perhaps, we can. Design. A model or inference algorithm, that. Could still, be able to reason about, unspoken. But, what we all assume, with, each other, which. Systematically. Influences, the way people, use language which then give us some clue about what, may be true about the world and we, can do this without having, robots, that can have an embodiment, and interact with the world, all. Right so now I'm going to conclude, with some remarks, about what kind what future future, research directions we, might, pursue. Further. So as far as. Neural. Network architecture there, have been some debates, about whether there should be innate architecture, or not I'm obviously. On the side of having innate architecture. I think. Indiana pit community. There have been more emphasis, relatively. Speaking more emphasis on the linguistic, sentence, sentential. Structure. Where there use, usually, are some explicit structure. And it's, very very well studied whereas, the moment we start thinking about document, level or discourse level were just, generally, long term context, or latent. Processes, of how the world works things. Have become much. Less. Clear how, to handle, this nonetheless I think this part is very important, to pursue further and, then. As far as the formalism I think. We should really start. Thinking more about like what to do with the common sense I was told not to use the word for sometime and, I. Think it's because. It. Was major failure before but, I realized, at. Some point that it's just nonsense to, conclude. That that direction, is going, to be forever impossible. When the, past failures, were based on weak, computing, power not, much data no. Crowdsourcing. And, not. Much computational. As, a strong computational, models and also. It was done by. Non. NLP, people I'd and I think with language we can do a better job or. We. Should try before. Concluding. That, it's not going to work so in. Terms of physical common-sense the, potential. Impact cases, could, be more with, zero, share future reasoning, cases. Especially. When there's a language and vision and robotics, applications. But. We can also do similar, things with the social common sense which, I haven't talked about today however, we've been working, something along this line, with, commutation, frames, and also we now have new ASL paper coming up which, I'm going to give. You a teaser about so, it's. Going to be about based. On the Roc stories developed. At Microsoft. And Rochester where. The story goes like this the band instructor, told, the band to start playing he, often stopped the music when players were often, they. Got tired to study the playing worse after a while, and so, forth so, when. They're already sort of people. Involved, in a story we, can oftentimes. The reason about, what. People. Might feel like here the instructor was a furious and, threw his chair, it. Doesn't mean to say what these guys feel now about, that situation but we can imagine. That they must fear feel fear or feel angry were said even.
Though It's, not mentioned, here so we, can sort, of have, this annotation, very, low level of notation about what people might feel. What. People's, motivation. Might be before, and after different actions, and events and, we. Have a bunch of new annotations, and baseline, models, reported. Coming. Up soon and then, related. Work also. In, ACL will be about. Common. Sense inference, for, example, if someone cooks Thanksgiving. Dinner probably, that. Person's, intent, is to impress, their, family. Afterwards. Maybe. They. Will feel tired but. Still feel a sense of belonging, and. Finally. Other people, even, though they are not mentioned in the sentence, they, will, probably feel impressed or happy, and so forth so we. Can reason, about what. People might typically. Do. Before and after in terms of their mental states so it's also coming up at ACL, with, a bunch, of new, data set so. Finally. I thought, or said that this. New epic. New, modeling, power those, creators, is some new opportunities for doing more, human centric applications so. That people don't fear just AI has a negative. Potentially. Scary thing I think, we could try. Doing more, positive, acts in that space. Personally. I think we should work more on interactive, AI because that's what robotics, people really want as well and. Especially. The neural network is so that we can do a lot of creative things in text, regeneration, too so I will just. Read in this so, net that, was, generated, by a machine so Turing was the title and it, goes like this creating. Some a locked electric. Slot machine of hearing, those familiar, voices say awake. Behind, an empty picture mean the window is open over Alan Kay consider. Me of regular, expression, or maybe something, very quickly typed, and I forget about on old obsession a hundred, thousands, people getting hyped become. The biggest part of my computer, and take another journey, down the road or, set a balance, on a minor scooter, or, even, better whose Enigma code and music, takes control of all the means is surrounded by a world of strange machines. Couldn't. Fool anybody yet. Thank. You. Thanks. Very much for an awesome talk agent we have about five minutes for questions so, I'll. Run a mic. I had. A question and I wanted to know your thoughts on so. Fashionable. Titus previous, there was work, done for open sight or the psyche knowledge base which. Is to encode, common. Sense knowledge and a knowledge graph and I was wondering about your thoughts my utility of using that for possibly distant. Friends, so, hey. I think, there have been some, couple. Are kind papers, that I tried to use at least concepts. An edge which is. Maybe. Is analogous, to open, sight and. Try. To see whether it can improve, some downstream. Applications right. To a and, Intel. Mount and the dialogues, so. It's. A great question personally I didn't, look at open sight but I heard from other, people who, did look at that that, it's. Very much easy or hierarchy, so it's a very dictionary, knowledge or taxonomy knowledge.
Concept. Net, which. I did look myself, it's. Also a very taxonomic. Knowledge, or predominant. With that they do, have a lot of other relations. That I really, appreciate, and find a very interesting. Unfortunately. Coverage, on those more interesting, cases are, a, little, bit lacking. So, which, sort of a motivates, our new, data set that, tries. To cover like motivation. Is for example. Included. In the concept, net relation, definition. However the, coverage, is relatively, small, and so, it's. One way to increase, the coverage, eventually. It may be useful, but, we'll. Have to find out. Supervised. Learning using. Humor, and annotated. Like. Data that and is, the hope that you. Can use this with, the correct architecture, to create, you. Know to, help, the. System to be able to in general, understand, how, to make these types of inferences, yes, so what. We do show in the paper is, to by, using very. Encoder/decoder. Architecture. Basically. Almost of that we. Can learn to encode, any. Textual. Description. Of event and then, reason, about what. Might happen, to people's, mental states, such. That given, a previously. Unseen event description, in the test dataset it, can still reason about what's likely mental, state which, is what we basically have to be able to do and. If, we are using all. The fashion, the logic base the systems then that, wouldn't work it then. There has been knowledge populated. For everything, whereas with, the neural network, we can learn to compose the likely. Anticipation. Instead. Of having to crowdsource, literally. All of it which i think is just not possible. Maybe. We have time for one more quick question before. Before. The end of this session. So. How do you know that, when. When, you're done as in how. Do you measure coverage, recall. Like, you can measure precision, but what about recall. So. In this particular work. We. Have. A sort of like test set cases, where you know test two cell starts, with this left. Hand side of, the logic and, then we try to predict. The right hand side, but. I wouldn't be able to say in terms of like recall, with respect, to everything that I ever know of I wouldn't know yet. Maybe, if, we start thinking about it someone will come up with a number but. Thanks. Once again Yujin. You.