NW-NLP 2018: Semantic Matching Against a Corpus
You. Hello. Everyone welcome to the presentation. So. We'll have three, talks in the decision first. Talk semantic, matching against corpus, new, application, and method presented, by Lucille n so. The talk will be for 20 minutes thank. You. Okay. Awesome. Thanks um so yeah hi I'm Lucy and anybody, talking about. If. It, will come back, semantic. Matching against text corpora and this is in, collaboration with Scot miles and Noah Smith at the University of Washington. So. We've. Clearly made a lot of recent. Advances in modeling. Sentential, semantics, like a lot of the nli tasks, semantic. Similarity paraphrase. Etc and so, what. We're the proposed is that a natural thing to do is to. Match, natural. Language propositions, against the corpus and. What. We see, and. Use possible end users as for, example historian, of science being, interested in tracking the idea of vaccines cause autism you might expect a spike in that in. News, after say, the Andrew Wakefield article. About, that. Similarly. I. Political. Scientists might be interested in how different. Framings of policy, issues occur. Or public, servants dealing. With recovery, after national disasters might be interested in ideas. About well-being or, recovery. So. This, is kind of a use case we imagine, where. A user specifies, a query like dealing, with authorities it's causing stress and anxiety, we, would query the corpus return, a list of matched sentences, for, example those in charge of recovery or making a movie of peace anger. Among home earners and at. The end of the talk I'll briefly touch on what, we might be able to do with elicit, match sentences, which, is to maybe, have it like proxy measurement, across a variable like time. So. And this is kind of related to as I mentioned work. On at ailment so my text alert, but, there's also related, work in information retrieval specifically. Positive table for question answering and also. Kind. Of on inside, its. In similar spirit to dynamic topic models and related work. So. How, this is gonna go I'm. Going to kind of formalize this a little bit and then. Talk. About two different applications one which in which we use examples, from a code book and, this is kind of the media framing domain. And in, a second in which we have our domain expert Scott, specify. Queries. Of ideas like he's interested in in disaster recovery and then we did a user study to. Validate. That and see if they're under interested end users and. Finally I'll briefly discuss this using. The output for measurement idea. So. Basically. As inputs we, would have a corpus of sentences which will say C and a. Proposition, query containing, the idea of interests which we'll call SP, and. We'll, just treat these as sentences um we're. Not going to consider document structure in this work but would, be interesting a future work and what, we would like to do is basically score, all these sentences s and.
C Such that. The. Score should be high if and only if s expresses, the proposition, quarter SP and from, there you can get the top end sentences and. Return. Out an alpha list. So. I'm gonna talk about this first application. Where. We're. Gonna use the media frames corpus, which. Is contains. Thousands, of articles about different, policy issues the one worth of focus on is immigration and. Spans. Of text are indicated, with different frame any dimensions and this could be these are things like if. The, issue is cast in legal terms in economic, terms quality, of life and so on and, we. Also have the annotation, code book for that corpus and we're. Gonna take 30 annotation, codebook examples as the. Ideas of interest and, when. Example this is immigration, rules have changed unfairly, over time which, invokes, this I kind, of framing of fairness inequality with relation to immigration. Immigration. And. So, the idea basically is that, if. We match a sentence, if. We, imagine an example sentence from the code book then, should, it. Should, evoke, the frame that we're expecting that it's an example of and just, as a side note there are many ways to evoke a frame outside the codebook so we're not really looking for like. High recall at all we're. Just kind of looking at precision here. So. Because. We're. Looking. Frames and it's kind of like a broad category. Where they use a really simple scoring function, and basically. What we're gonna do is each, sentence, we're gonna represent as the average of its of its word factors and then, we're getting Co sine similarity between the two sentence vectors and that's, going to be the scoring. Function. And. We. Use. We, try out two different where vector variants that are pre-trained. One. Are periphrastic. Word, vectors by whiting at all and these. Are trained on the ppb they're kind of designed to be used in these sorts of similarity. Tasks, and the, other is the very. Well-known work Tyvek. Pre. Trained on Google News and the, nice. Thing about these is that because, they're trained on such, a large data they, might have a lot of entities we're interested in. So. We're interested in how well does our. Output. Line, up with the corpus, annotations, so, we look at this at the sentence level where if we, match if. We match this sentence. Does. The frame line up with the the. Annotated frame and what, we find is basically that the, periphrastic vectors, do better than the words of executors which in turn do better than just a tf-idf baseline, and. So we're not really looking for they're. Clearly probably better methods to do this to get much higher precision but, this. Is kind of a first. Step and does. Really well give it how simple it is. So. There. We go. Secondly. I'm going to talk about matching. More specific, queries and this, is gonna be in a domain, where. Researchers. And public servants don't necessarily have a lot of. Empirical. Data and they want to understand what challenges the community is facing as kind of a post-mortem post-disaster. Sort, of work. And. So. Our. Thinking is that we can use, um used. To these techniques to find relevant, sentences. From. Which they can how. Much they can learn from and so we're gonna specifically, focus on the Canterbury, earthquakes New Zealand from 2010-2011. We. Pulled. A corpus of. Sorry. Of. About a thousand news articles that were local, from. Local news sources and, we. As, ideas. Of interest we had our domain, expert Scott, provide. 20 queries like he was he's, interested in for some work and these are things covering community well-being infrastructure. Of decision making, utilities. So on for. Example the council should have consulted residents before making decisions it seems obvious apparently, it's not to certain governments and so. In. This case unlike. In the framing case we're really interested, in kind of more finding great matching really want to match this idea. And. So. Because of that we're going to hear, use. A more. Entailment. Sort. Of model it's. Gonna be based on syntax, and I'm, gonna walk through it very, quickly. So. Given, two sentences, on the candidate, sentence from the corpus and the. Proposition. Query, we're, gonna get the dependency, parses of each. We're.
Then Gonna find a sequence, of what, we're calling tree operations, which transform one, tree into the other and so. The intuition basically, is that the. Difference. Between the two sentence is somehow indicate the relationship between the two if it's saying back to each other if they don't, and. This. Syntactic, transformation, is one way of kind of getting at that and, to. Be more clear we. Can take this example where. We. Start with the sentence on film a bureaucratic, systems are causing stress and we. Want to find a set of transformations. Such. That we. End up with this tree at the bottom, and. So, you might want to delete a couple of the words read. Label and, then. Finally insert. You get the entry, and I'm, not gonna go into details about how we do this by it's basically a greedy search. From. There we use that sequence ties features. Into a classifier. The. Original paper from hilum and Smith's that we worked off of use. The logistic. Regression. Classifier with. Based, on counts of features, this. Is 2018, so, we're, gonna also put into, a neural networks it's a sequence and we should. And. We just trained it on SNL I stand, for natural inference, natural. Language inference corpus and. Just. Kind of on a two-class entail versus neutral contradiction, and. So. To make, this run in a reasonable down time we first. Do. This this fast word vector based matching, on. An. Entire corpus obtain some subset of matches from which we then run this entailment based model and we, tried out different combinations, of this. So. To evaluate we serve a twenty emergency managers and. Took. The output of the different systems. And had them rate from, one to five how good. Of a match they were to compare to the query. So. An example that we showed them is, there's, a shortage of construction, workers and so. One, is like this, is not really, expressing the idea at all to, five which is hey this is great. And. So, what we see is that on the left or the left, you are basically we. Use the periphrastic, vectors. In this like first pass filter and then. They're. Kind. Of between the two sides that are the budget. Logistic, regression analyst him versions. Of the, syntax. Paste model and in. General the periphrastic vectors again do better than to work effect ones and this, makes a difference on kind of like what you do afterwards. Basically. If you have kind, of mediocre output, from that first pass you're, not really gonna make things better. We. Also asked. Respondents if, they're. Just interested in this, application. In general most of them said yes which is great. And. Then half of them we're actually interested in a follow-up study which is it progress and basically. We're having them specify their own idea. Queries, and so it's, kind of like this back and forth study.
Which, Is cool. Um. So, finally I'm. Gonna talk about. Using. This, sort of matching output in a different, sorry. Using, this output and like, a measurement, and. What. I mean by that is, say. We have this query. Dealing. With the stories is called a stress in anxieties I've been kind of using. Service talk and. If. We take. Its. Ok. There. We go if. We take. The. Top, 50 sentences, we. Look at the metadata for the articles they came from they're. Published on a certain date um which. Kind. Of indicates oh hey this idea is showing up its time and it's. So if we bunk if we've been this in three months we get the following, um histogram. And so. We show this to sky and he was like oh this is interesting. That's. Kind of worrisome, unfortunately. It's. Like way doubt reboot. And. Basically. I. After. The earthquake happens which was kind of point 11, ish people. Are still kind, they're still feeling okay about the, fact that they're, getting help the government's being involved in. Giving. Aid in terms of recovery services, etc and as time goes on. Worry. And stress starts, to set in because. Their. Needs aren't being fulfilled or. They feel like they're being forgotten and then so that translates into. Subsequent. Coverage. So. Some. Things that we found, for both studies um, that. Kind, of motivate future work are. One. The. Complexity, of an idea so. Some inquiries that are too general to specific that. Impacts, the kinds of matched, sentences, you get and also. Can we give user guidance for maybe running better queries. So. That's kind of a progress, with this, follow-up study. So. Another. Problem we kind of ran into was entity. / Co reference, this. Is particularly, interesting in the New, Zealand earthquake. Data. Because. A lot, of those entities just they. Don't they're local they don't really show up in our news an. Example this is Sarah which is the Canterbury, Earthquake. Recovery Authority, and. Apparently. Also the last name of an actor who I. Didn't. Know existed. Finally. They're. Just the fact that we're using we're. Kind of breaking off sentence boundaries, is. A is, a limiting factor so. In two ways one the surname context, can invalidate a match and secondly. A, potential, match can be, kind of spread across a sentence boundary so we'll miss it and so. This. So. In future work um it would be interesting to incorporate. That or document structure so. On so. Yeah. So. Uh just. To wrap up in this talk we showed. The kind, of viability of different semantic matching methods and to two pretty, different application, domains we. Performed the user study to, establish. Some end users are interested at least in this particular application and. Motivated. Future work on semantic, matching and other measurement applications. Thanks. Yeah. Except. It's to ask would, you write good freeze how, do you value what is better query and what is. Sure. So you're, asking basically, um. So. How we're how we're getting them to write to write better queries and help me might your value might guys. Yeah. So we're, so when this fellow study were not. We're. Be we basically asked them like what would you like, what would you write as. A query without being without, giving suggestions and, they're. They're actually somewhat, different, from the ones that Scott provided um, and, so.
It's, Probably, gonna be more of this back and forth thing of us. Seeing, how well. What. Kind of output we get from these new queries, um. How, well they score in terms of like it. Expect it out but that they want to see and then, going back and being like oh well, maybe you should simplify this I'm. In the future, does. That make sense yeah. More. Questions of time. About. Your complexity, about the, new find and, defuse, of the same name okay I'm this is gonna be a problem across a lot of thanks. What. Is your guys's idea, for handling. That in the next project so the, question is how will we plan on handling kind, of. Entities. That share the same name, but. Maybe just appear in different domains. I'm. Not sure yet to be honest I. Mean. One thing we can think about was kind of getting. A small amount of label data on the target domain and then doing, a little bit of domain adaptation any sort of stuff to, make, not perhaps, less of a problem. But. That's all still kind of progress. Anymore. Just. An interesting, or maybe more of a comment one, thing that might be interesting. To play with is there's, a. Corpus. Of CNN, and Daily Mail articles, if you've encountered, that I've seen it okay so they have the highlights of the article and, then the article itself and so it's an abstract, 'iv assigned, Ernest that, could relate to that that could be a fun, right. Yes, and training, daily yes. So, I guess the comment if people didn't hear was that. The. CNN Daily Mail corpus would be an interesting thing, to apply this to. So. I have a question. So. The questions if we tried other methods besides just kind of simple addition, of the word vectors and the kind, of stuff basic word vector matching so. Yeah and um. So. This. It was kind of motivated by that. Whiting paper wit in which they had found, that. Still. Are averaging, for, the directors had actually worked better as I'm using ring it's really nice to him or something um I. Think they since have subsequent. Results which. Have. Kind of modified that and so I would probably try something else in the future. But. One thing that's nice about it is just how really, how simple is and. It still, gives. A decent signal. Any. More question. Okay. The things that be good again. Nothing. Unnatural noise. So. So, hi this. Is some joint work we'll be presenting that, I did with so, I'm Jana ton sorry I'm doing this is present with you on a time and. The. As. Mentioned. This is our talk a synthetic, and natural noise, both, break neuro machine translation, what, I've gone ahead and done for you here is I've just left in where. The spell checker has decided there's an error so. That's all these red lines and. And. The beautiful thing is you might be thinking well maybe, people make typos these are actually all real mistakes. That people have made so, what. We did this is from a wiki edits corpus so I just went ahead and swapped in the mistakes that people make and you might be thinking well not. To worry machine translation is, is, really, robust you. Know spell checkers are really robust first, I'll run, that and see what what it fixes so. Not. To worry what we end up with is a, much better version so. Synthetic ant natural, noise doth break neural machine in translation, and. It's, also figured out my pseudonym which is Yucatan busk. So. We've. Made a lot of progress here. It's. Also much more Shakespearean, which I feel like gives it a sort of gravitas that was missing from the title. So. So, clearly spell, checking is sort of not perfect, it's it's nice and English in particular we have a lot of did, you mean kind of data so, that really helps and then. Obviously here's the actual title here are actual names so. The. The sort of basic you. Know place where the situates in the literature right is in this context of things like adversarial, exam and we're very familiar with this kind of classic, example from, the vision literature, right where you have an image which, was classified as a panda you add what is seemingly sort of imperceptible.
Noise And now you have a Mis classification. And the question becomes sort, of what does that look like for. Natural language and what. Sort of important important right is that the noise distorts, the perceptual process in some way but, it doesn't actually hinder the semantic understanding so maybe you did notice that the second image was a little bit blurrier but it didn't actually affect your ability to understand. What it was and. This, is not a typical right of our perceptual, system so for example our eyes all saccade all over the place all over the all, the time and they, sort of give you a little bit of that this is not in the paper but some work we've been doing since then if. You take some. Some, text you go. Ahead and introduce the same kind of errors that I showed earlier so, here we have a lawsuit was filed March, 27th, and the preliminary investigation was. Opened by the prosecutors, first, of all it's important to note that no one has any trouble reading this and, just. To prove that point we, went ahead and run an eye tracker so, we have people sit down and they're actually reading the sentence so we have a sort of fixate here they, jump a little bit move. Forward, an entire phrase back. Track and. Then move through the rest of the sentence, the. The only reason this is really important is that there, are 14 words there are nine fixations. So, when. People are reading this text and we actually have them answering questions at the end to make sure that they did comprehend, the text they're. Not paying. A lot of attention to a lot of the words in fact based, off of what we know about how. Much you see when you look at a specific part of the of the sentence what it probably happened in this fixation is they saw the entire word completely. Skipped was and then moved on right so, they probably didn't even notice which, we see for instance there's some interesting, work in the psyche literature where you interview people didn't, even notice that there was an error there and. We do this all the time right we make mistakes in our own writing. We then reread, it and totally, miss it and we need a new pair of eyes to notice it so, sir so perception, is sort, of fundamentally, noisy and sparse and so maybe the reason that we aren't disturbed by these kinds of adversarial, examples, is because we've been sort of training on it our entire lives another, place you might be training on these things is messy. Handwriting right your entire life you're trying to decipher doctor's notes or, other kinds of things maybe your child wrote you, may yourself have really messy handwriting and so I went ahead and did some some, research I went online and I searched, and it looks at like if you have missing handwriting you might have emotional, baggage so.
That. You know just just take that home on the other hand I did continue searching, and if you go further, down on the results you might also be a genius so it's a little unclear what. The research shows there but either way you've probably been exposed to. Noisy. Texts throughout, your life so. What, we're doing in this setup is I'm just going to give you like the super quick we, have a bunch of models which we're gonna test and see how brittle they are and. How noise affects them so. First we use NEMA test which is a machine translation. Framework. Which, performed. Very well on a bunch of WMT. Tasks. From last year it's, a bpe based, representations. If you're not familiar that just means that, instead of using words as input and sort of finding sets, of characters. Cuz they so for instance like a morphological ending, like ing, gets, chunked out so, this is a really helpful way of increasing. Robustness, to add vocabulary, effects we're. Then also going to use this, work. From tackle last year char to char this, is a very impressive, model, that it doesn't even take into account, segmentation. Information. So it just treats the entire input as one big character stream and then and then translates, to characters and then, finally. Just as a simple baseline, what, you might code up in an afternoon, so this is kind of your sequence. Of sequence model with attention and character. Based convolutions. For the representation, so it's not super important the thing to keep in mind is just that this. Final model here and has notions, of what word boundaries are but looks at individual characters this. Model doesn't even care about word boundaries, and. This top model has maybe slightly more intelligent. Ways of figuring out sort of morphological. Units, and so forth and then, for the data we're going to use a spoken, text corpus, so therefore the. Iaws, LT composition, a couple years back they had a bunch of these TED talks and their translations, and so that's what we're going to do the evaluation and then also some of the training I'll show you so. First let's talk about synthetic errors so the beauty of adversarial, examples that exist for instance in the vision literature is that you basically can just sample random noise so, what would that look like for us if we wanted to just automatically, create. Well something we do all the time is we swapped you letters right you're typing you type to quickly you flip to letters so we're gonna restrict that to being in longer words and only in the middle of the word so, sort of make it a little easier but, noise, goes to nosy or something. There's, a meme that you might remember for many years back according to research at Cambridge, if you scramble, all the letters in this word or, something to this effect you can still read it as long as the first in the last letter stay put so we're gonna pretend like the Internet knows things and we're, gonna go ahead and and follow, the meme for our second type of noise, third.
What If we just ignore that and randomly permute all the letters in the word and then finally all the time we end up hitting a key by accident, so what happens if we swap one of the letters with one which, is within a radius one key on the keyboard and, the. Effect of this on these, two on the sort of the two pre-trained, you, know state-of-the-art, fancy, Mazal models is that as you've changed, the number of tokens that are per muted or, that are messed, up you, see the blue, scores drop off not, terribly, surprising, and. What. We see for example is that by the time we get to something like twenty percent we've, actually lost you. Know almost half of our performance, right ten. Percent is even is still, a pretty steep drop-off in both of these models so, it doesn't take a lot to mess with translation. I'll. Come back to what we might do about it a little bit later the. Second thing you might ask is well these, are some nice synthetic, results what does it look like if we actually try to use real noise so I said you we have some wiki edits and stuff like this so it. Turns out that there are some some beautiful, people. Out there that have created these collections. Of sort of second language speakers in there and edits, to them writing essays, Wikipedia. Corpora, stuff like this and so we can go and we can find a bunch of naturally, occurring errors, in the wild and then we can use, them to permute our text I don't, speak a word of German but I'm told that, what's. Going on here is that for instance we, have phonetic distinctions, like T and D s and z are getting getting screwed. Up, we have dropped letters like the T. Missing and babysitter here and. Then we, kind of get into the more interesting, space which, is things, like morphological. Problems right so where someone has has, conjugated, something incorrectly, this, is kind of stuff where if we really want to be able to do this automatically, we're gonna have to have a much maybe, stronger notion of linguistics. To do it than the kind of simple errors I was presenting before so. The, question, becomes, do. These errors, affect. Our models in the same way that, our synthetic errors, do yes, so. All I've done here is I've gone ahead and superimposed. The. This. Is the natural error line on both. Of these it's just, a hair better so maybe it's the case that there are certain types of errors that are less likely to be made then, just. Willy-nilly, swapping. Or. Randomizing, of letters so for example maybe people, are less likely to mess up a capitalization, maybe.
People, Are less likely to mess up you, know they they may choose, the wrong conjugation, but at least it's a valid conjugation, these kinds of things right so there's a there's a quite a bit more exploration that's necessary there the, other thing is that we don't have full token coverage because. For. Example and it, we just don't have errors. For every single word so this this maxes, out at about 40 percent. Okay. So as I started with though we, have spell checkers so, like. Obviously. I said they're not perfect but like they're, probably like not terrible, so, here's. What, happens with our French German. And Czech models, if we. Just. Introduce a bunch of natural errors so here are blue scores and, here's, what happens if we just select, the first spell, checker suggestion, for, every single word in the in the in the scrambled urge to be in the air-filled, corpus so, what you first note is that first French and German we are seeing pretty significant, gains something's, the tune of five to six blue. Points so spell checkers are doing, something the, other thing though that I should. Mention is that in the case of French and German you're, only seeing, typically. One to two suggestions, from the spell checker for each one of those recommendations, that. Is not the case for check right. So when you look at the check errors what ends up happening is you have this insane. List of all the things that could possibly happen, again. I don't, speak any Czech so I just picked the first one and. It turns out that actually on average I made things worse so. That's. Sort of not you know maybe for the. Small error condition, given sufficient contacts or rich spellchecker. Will be beneficial but those are also going to break down and then just as a reminder these are still, quite a bit different from what, it would be like if you just trained and tested on clean, text, so, excuse. Me so now let's try to see if we can do maybe, something about robustness so, what about training with noise so.
There's Way, tuning many numbers and so if you're really bored we've just put charts and charts in the paper so I'm going to try to synthesize a couple, of quick lessons from it so. One is if I train on noisy, text but I click but then I test on clean unadulterated. Text, how do I do so, here, what we've done is these are the different types of vanilla is completely clean here are the different types of noise that I mentioned earlier and here's what happens I sort of ensemble's, on the noise and. For. French and checked so that this doesn't get too cluttered here's, sort of ideally, where we would like to be this these, lines and how well are the models actually doing and, the. Thing that I want to point out is that, the natural errors, are actually quite a bit harder, for some reason so, despite, the fact that we. You. Know this, is this is insane right here I've literally scrambled. The entire word, and. That doesn't. Affect. The model as much, when testing, on the original data as when. I introduce natural errors, so there's something that we still need to investigate there another, thing though that is somewhat heartening is that, you know at least once, you take the natural and you add other noise to it then. The model becomes sort of more generally, robust. Another. Thing that I'll point out very quickly is what. Happens if you test. On all different types of noise so here, what I've done is we've, trained. On a specific type of noise and then I'm presenting, the average blue across, five different types, of noise for testing so. For. Instance just to make this concrete here, we just do, the letter swap inside, of those words but the blue score here is when you test on swap. Middle. Random, keyboard natural so all of those and then in an average the, first thing that I'd like to point out is that the. Model does seem in some, sense to have a robustness, they're, sort of the direction of robustness right so if a model is trained on where, that are totally randomized, then, it does better on average, including. On things that were that where Jesse swaps then if it only see if it sees less noise, and this seems to sort of, hold. Over to the ensemble case right so, once, I throw just tons and tons of different noise at it the model seems to do the best sort of overall. All. Right so what you would like to do right so if I was sort of like amazing. At cognitive modeling, somehow, I would build a model which not only took into account noise but, also just. Like was. Impervious to it just didn't care we, tried a bunch of fancy things and they didn't work and I can talk about self. Attention, he type things later well. So what ended up sort of seeming to be fine was let's, just try the mean character representation so, if, I, have a model, whose word representation, is just literally the average of the vectors for all the characters then, it doesn't matter what order they're in right so like it's by definition. Resilient. To this so. There's the good news it means that I can train on completely, clean text, with no errors I can present it completely. Scramble text and it, still. Does okay, it's, like a twenty percent hit not the end of the world and then. The thing that's not surprising to anybody, in this room who's ever thought about language, is that there are morphologically rich, languages.
For. Which turns. Out you really don't want to just average all the characters, there's, something, in the structure, of the word so. That's. That's clearly not the solution, as one. Final, note though so I think I do want ideally, to be able to move in this direction right I'd like to be in a situation where models are sort of more inherently, robust, and so one thing we wanted to do is at least kind of analyze, what, the, filters, of the model we're learning when. We're sort of showing at various types of noise so it's a little bit hard to interpret all of this but let me just sort of figure. Focus. On one point this. Is what happens if you look at the weights, in all, of the filters that are learned for, the character, embeddings, and our models when they're trained in these four different settings with keyboard errors natural, errors completely. Randomized, ordering. Of the letters in the word and the ensemble, note. That these have pretty high variance right so you're learning different so different filters are going to be firing for different settings for different. And then, note that when you train, on random, they've. Completely, collapsed, and so, what, we think is happening here is that basically when you completely, randomize the input like that the models basically rediscovering, that mean character, model from before it's basically giving up on trying to learn structure and so it just takes some sort of an average of everything that's in it and then moves forward, so that. Seems to be kind of like a baseline, default, maybe, we are even doing something like that but we have additional, systems which can trigger, if we're not understanding, right so we can reason about things we can sort of piece together how. We might unscramble, a word and so forth so that's kind of where we'd like to move in terms of future research Thanks. We. Have time for questions. Thanks. For a great talk so, have. You thought about what, would be in the scenario, and adversarial, noise like. What, would really screw up the system like for, the image processing there were like random would sometimes cool is, there like an adversarial, thing which was real so it depends on what you mean so there's kind of two versions, of adversarial right so one is can you just break all these systems like obviously you can break all these systems, what. We didn't do that we played around a little bit it's just very time-consuming. There. Are cases, right where if I change one of the letters I will change the translation and so, it's quite possible that, some, of the if I wanted to trick, a translation, system for example into saying something in particular I might, be able to make some changes on the input which would cause that but which the human, wouldn't think of as as the facting it and that's I think sort of the ideal adversarial, case I don't have a great notion. For it we tried to do that manually and we were able to find some examples I don't have a pipeline as. Of yet but I think that would be yeah it'd be awesome if you could come up with these kinds of these kinds of examples yeah. Hi. Great talk, I'm an Kyle from Boeing we deal with a lot of very noisy tax, data for. Example. McKenna's. Over the world maintained, a airplane, and they write poorly, so. Really interesting your talk so, did. You try to deal, with abbreviations. The, chop off some the letters, and sometimes, they insert letters too. Yeah. So we didn't explicitly though, anything actually skipped it one of the they're. Not. Exactly abbreviations, but definitely omissions, is pretty common in the Edit, corpora so, it, is pretty common for people just a completely miss a letter for example which.
Is Kind, of starts to get at this, but. No particularly. Fritz is in, an. Abbreviation, you have even less to go on so. I, don't, I suspect will just make everything even even harder but I don't have an answer to what. - I don't have an answer did you what to do about any of this but but. I but I don't yeah. That's. A great that's a great problem also. Do, you have a sense, how much misspelled. Words. Involved. Before you were really not able to do as well, percentage-wise. In, a sentence, yes, sir it's it, becomes very clear, so. We all we all have our experiences, with machine translation right, where we run something through and, then we it, even, even in the best, of cases we. Still have to do some detective work to figure out what was actually intended right so, in, that sense, yeah. Even if like that 10% mark if I'm just if I'm just basically imagining. Of a sentence and now two, of those words have. Something. Randomly new the detective, work becomes very very difficult to. Figure out what was actually going on I. There's. A there's an example for instance in the paper where we run one. Of these kind. Of fully scrambled things through Google Translate, and you kind of get out and then also the output from these other systems and you get out these sentences and in, some cases the models default to sort of copying they're like oh well I'll just take the German straight through and just hand it back to you and. In some cases you get what, it's attempt, is out of sentence it's just that it's completely, meaningless but you it has no bearing to what was originally there so, I think the answer is probably not a lot and maybe. A spellchecker will be able to help you but I also want to note that there's, a huge, difference between spell checkers right so there's a difference between like, for instance the Google Translate spell. Checking that we were doing here and if you run like a spell on your UNIX machine. So. So having some sort of central contacts is really important to being able to address those errors. Yeah. Ok. Any more questions. For. Your corpus of natural. Errors I noticed, you had some that were the. Wikipedia. Typos. That were probably from native speakers but then you also had language, learners, right and, from a linguistic standpoint.
The Errors made by language learners tend to be rather different, from natives so I was wondering if you looked into how those differ and and anything, about that. No. Unfortunately, we were mostly focused on just finding, these errors but, you're absolutely right in the case of. I. Have. To double-check the details I think in the case of the German and the Czech those are both language. Learners, and. Then in the case of French and English those are wikipedia edits and I think that's partially just an issue I mean as its availability right. There. Just isn't as much of that kind of stuff but I think that that's another sort of fascinating, thing right is that if we, were to build these models ideally, that a robust, we'd like to be able to introduce these kinds of errors that require someone to first collected. Or found all these signs of things and as as resources. You know as the language becomes not, English. Resources. You know good so yeah. So. This is talk in this session, why. Swapper, she. Will talk about syntactic. Scaffold. For cement extractions. Hi. I'm suave ah can you hear me okay. Yeah. And I'll be talking about syntactic. Scaffolds, for semantic, structures, and this, is work, done, jointly with my co-author Sam, this trade here and Kenton. Do Chris, and no Alex right there lots. Of affiliations. Literally, too many loggers. Alright. So the tasks we, are interested, in is semantic. Structure prediction now. Depending, on the formalism, these, semantic, structures, may look very different for example we have the sentence, evanka I told Faye Faye that she is inspiring. The. Frame semantics. Structure. For this sentence look like it looks like this there, are two frames of meaning one about the event telling, and another. About the, event. Subjective. Influence, for. The telling frame the speaker is evanka the, addressees Feifei and the, message is that she is inspiring, and, for. This objective. Influence. Frame. The, entity, is she and she, is providing. The influence, so, this. Is what frame, semantic, structures. Look like a different, kind of structure is, given. By a core efference where, you have in green all the entities, which, are present in the sentence as well as relationships. Between, the. Word she and Faye, Faye hopefully. She, refers, to favor here, but. Yeah. So this is the kind of structure that you get from core, efference chains, so. As you can see these structures look very different, the frame semantics, looks quite different from the, correct structure, but, what they have in common is, the. Syntactic, framework. So for. Example, we, have this in the. Constituency. Tree, for the sentence, the same sentence, and we can see that the. Different arguments, speaker is, non phrase addressee. Is also a non trace and the message is an is, an esper so, what you can see from here is that if. You have the syntactic information. Of a sentence it might be easy to get to the semantics, of the sentence. And. Indeed. This has been. Tried extensively. In previous, work so, prior. Work involves. Using, a pipeline, where, given a sentence you first do, the syntactic, processing of, the sentence to get the full, syntactic, tree use, features, from that tree to.
Extract. The semantic, graphs from it now, the, problem, with this is that it, is both. Very expensive, and since you're. Doing things in a pipeline you're, very likely to. Cascade. Errors, if you get the syntax wrong you're very likely to also, get the semantics, wrong also, the tree that we saw in the previous example is, very more complicated. Whereas, the semantics, was relatively, simple like Yonatan said when people read sentences, if. They are trying to semantically, understand the sentences, they did not read through the entire sentence, some words can be skipped some, words are semantically, vacuous and so on so. Using. Syntax in a pipeline works but it is, probably. Not the best solution now. In. Recent. Times, people. Have come up with neural, end-to-end models. Which, have completely. Done away with syntax, so. These end-to-end models, just start with the sentence and try, to predict, the entire semantic, graph from, the sentence, now. Even. Though these models work. Regardless. Of not using syntax it has not been conclusively. Shown. That syntax, does, not help further in. Fact people, from, Europe, Liu Heng who somewhere over here, showed. In her paper that using, constraints. From, gold syntax. Actually. Helps, so. There. Is still hope first, in fact there is actually a lot of hope so what we are suggesting in, this work is somewhere. In between these, two worlds we want to use syntax but we don't want to go through this whole. Of, pipelining, and. Errors. Cascading, and so on so. What, we are suggesting is, called. Is this method called syntactic, scaffolds, which, uses. As the name suggests suggests. Syntax, but, does not go through the whole pipelining. Shrine. All. Right so hopefully I introduced, the problem to you let's. Look in detail what, these syntactic, scaffolds, are so, this, is essentially a multi, task learning, setup where, one, of the tasks is the primary task we are interested, in which is the semantic, structured prediction does frame. Semantics, or koreff in our case and the, second task is syntactic prediction.
However. In contrast, to, your, traditional multi, task learning, setup where both the tasks, are equally. Important, for, us the syntactic. Task is a secondary. Auxilary, task and the primary task is the semantic, task that we are interested, in so, multi, task learning, gives us this advantage. That. We can use shared parameters. To, learn better contextualized. Representations. For, the spans and whatever sub, structures, that, are semantically. Non. Vacuous. Also. In this setup as in general Meletis learning, we do not need the supervision, for the semantics, and syntax to. Come on from, the same data, set so we can use two different data sets one with only semantic, annotations, and other with only syntactic. Annotations, and still be able to learn. Which. Is difficult. In your regular. Pipeline setup because you need the, syntax, on the same data before, you can predict semantics, on it. Another. Thing that, the syntactic, scaffolds, approach offers us is that, we do not have to predict, this entire, complicated. Syntactic, tree with all its bells and whistles we, just have to focus, on the sub, structures, in that syntactic, tree which, are meaningful, for the semantic, prediction, tasks we. Look at some, of the sub structures, later, on and, finally. As. The name suggests, it's a scaffold, so it is only used at learning time to learn the model with. All the syntactic information. In. It and a, test time we can totally. Discard, this and move. On. All. Right so, the, learning problem looks something like this, so. This is the objective equation one shows us the objective, the first term in the objective is the primary objective. The. Syntactic. Structure prediction objective. Which we will look into later, and. The secondary, objective, is the scaffold, objective, both are interpolated, by this term Delta, which, is a mixing, ratio, that tells. You how, do. We weight these two objectives. So. All you have to do is plug in the scaffolding, objective, into, your, semantic, task semantic. Objective, and you. Will be able to learn. Information. From, the, syntactic tree bank and if. You look into the objective what it does is it's, a simple logistic, regression, classifier it, takes, all. Possible, spans, in the sentence up to a certain length and, tries. To predict, for each span what's this a syntactic. Substructure. Of that, span could be whether or not it's a noun phrase what kind of nan phrase it is what the, parent. Of the noun phrase could be and so on so, to. Formalize, it the different. The. Different sub structures, we try to predict, our phrase identity, whether or not a span, is a. Constituent, phrase, or. A syntactic phrase the. Phrase type of the span and if, the. Span, is not a syntactic, phrase we predict null. The. Phrase type and the, parent phrase type of. The span this a parent. Phrase type is the, Constituent, which immediately. Precedes. It. In the tree and. Special. Phrase types such, as non trays and prepositional, phrase that is given, a span whether or not it is a noun phrase or a prepositional, phrase and we, do this because a, lot of these arguments, as we the, names Ivanka, faithing they were non phrases, and a lot of semantic. Arguments, tend, to be non phrases, a lot, of semantic arguments, tend to be prepositional, phrases and so on. Okay. So, let's look at some of the the to downstream, tasks, that we. Incorporated. This framework in the, first one is frame semantics, so. Again, this is what a frame semantics, graph looks like we, are using the, frame net corpus. Which. Was. Proposed. Which, was created, at Berkeley and in. This particular. Work we, only focus. On the task of identifying. And labeling. These pans so, given, a sentence, as well as the.
Frames In the sentence that is if we know that there is a frame telling, in, the sentence and if we know there is a frame subjective, influence, in the sentence are we able to predict, what the arguments. For those frames are and, these. Frame identification. And. Also. Target identification, tasks. Are relatively. Straightforward. And have. Been solved, previously, with high accuracy but, argument. Labeling. And argument identification. Are difficult tasks so we focus only on that. As. A baseline, we, use this model, called, semi markov CRFs, which, is a, generalization. Of a CRF, so the generalization, only comes from the simple fact that. You. Have to predict for a given sentence, you predict, a segmentation, of the sentence so, in a general CRF. Each segment is of, the same fixed length which is typically, one a, semi, Markov CRF offers, you the freedom of, wearing. The length modeling. Also the length of the segment that you're predicting so, this works very well for our case because arguments. Can have, arbitrary lengths. So. This is a natural fit for the. Task of frame, semantics parsing. Without. Going into too many details I'll. Just say that we focus. On spans. Of true. D so, if you had to model every, possible, span in the sentence of. Length and you would run, into o of n square complexity, and we pare it down to O of n, D because we only limit ourselves to spans. Of length T, and, the. Factor, L comes, from the different possible. Labels. That, you could label a span with. All. Right and to, model the spans themselves. We use an architecture. Like this this is from, co-authors. Kenton, and Luke rank. So. How. This model works is given, a sentence, we first get the pre-trained embeddings, for that sentence we use Club, here we. Pass it through two. Bi-directional. Alice temps to get the contextualized. Representations. Of each token in the sentence, these, are given by the yellow, embeddings. And. To, represent, a span what we do is we take the first word in the span and the last word and concatenate. The hidden representation. Of both words and. We. Also use, this head finding, mechanism, that. Finds. The head or it chooses a simple, attention, mechanism, to figure out what, would in the span carries. The. Most syntactic. Information, and, that. Is given by the red, nodes over there and the final span representation. Is given, as the green rector's. Okay. So, all we do in the scaffolded, frame SRL, is plug, in, this. Objective so the primary, objective. Is shown on top it is a simplified, it's a modified a lot likelihood, objective, and all, we do now, is use, the same span, representations. Or use the same parameters, to, learn these pan representations. And plug, in the, scaffolding, objective, a scaffolding. Objective. Is trained on on, two nodes data which is built on top of penn treebank and the. Primary objective, is trained on frame and, so. Here, since. We have two different card, we, can, get. By even though there is no data overlap. All. Right so looking at some results, our baseline. Semi, CRF, model gets, a significant. Improvement. Over, previous state-of-the-art. Models. And as. We add these different, scaffolding. Models we, keep, on seeing improvements, so this shows that adding syntactic. Information, is still useful and the, best model that we have which is shown in crate is. This. Particular. Scaffolding. Model which takes. A span and prigs whether or not it should be a noun phrase or a prepositional, phrase so. Getting. Very specific is seem to help. Alright. I'll. Quickly go over the next task which is current, resolution this is also a span, based task and. So. We, use this model from, Canton, and others where. This, task is treated as a series of sequence. Classification. Tasks so spans, in the sentence are labeled, as, entities. And. Also. If you label for a given entity what, is the procedure the, antecedent, entity, to that particular. Entity so this way you can. Build. These simple, classification, decisions. To get entity. Clusters. So. Again, all we do is since, we are doing a span based modeling for the core efference model we can use the same span, representation. To make. The classification. Of what kind of syntactic. Span. It. Is and so. It's again very simple, just, plug in and. Interpolate, with Delta. Again. We see an improvement when. We plug in the scaffolding objective, and. The. Improvement, is not as much as we. Saw in the frame net test and this could be attributed to the fact that we, do this again, on the on to notes tasks where. The. Way the core efference. Mention. Current, mentions were annotated, were on top of these, syntactic. Constituents, so you already. Have that information, in the data and you're reusing, the data to learn these syntactic. Constituents. So, it, is possible that that, is why the improvement. Is not as much so. In summary we saw that syntax is very useful for semantics. But, syntax, is expensive, so. Scaffolds. Are an, inexpensive. Alternative, that. Can get. The best of both worlds, okay. So what's next so this is not something, that could, is.
Confined. To only semantic, tasks basically, any task that uses. Syntactic. Span information, could, benefit from such scaffolding. Syntactic. Dependencies. Could be used as scaffolds, too and finally. You can use semantics. As scaffolds, for various. Downstream, tasks so. That's, all. Um. Canasta the, differences, between the, the. Recurrent. Neural network, together with the classification, on top, of on top of it and recursive. Neural, network standard, recursive, right. So. You can think of recursive. Neural. Nets as like a special. Case for. Of. Recurrent. Neural networks, so, basically. To. Build a recursive, neural net you need to know what structure. It is modeling for, which you need to know that there is some, kind of a syntactic tree and you, will use that syntactic. Tree to compose. Together the nodes but, here we are just, modeling the sentences, we don't have any information about, structure. So we cannot really use recursive. Neural, nets in this case, but. It is another way to represent. Information. Like. These. Graphical, information or tree, specific, information. So. If you a great, presentation, if you wanted to. Enhance. Where. You were going beyond, the frame, induction, and and. Capturing. Work semantics, like possession, and negation and some of these other types of things, would. You. Need. To go more into doing, you, know grammatical. Dependency, types of relationships, rather than the span based representations. So. We've. Actually been thinking about, doing. Similar things for, these. Natural language, inference, based tasks, where. Relationships. Between words, is. Shown. To be enough, to, predict. Inference. Relationships. Between. These. Hypotheses. And premises, and that, is where the, syntactic, dependencies. Careful's is. Hope, to help the most, because. Yeah. You're not. When. It really, depends on that uh so here we had to span based tasks, and so we saw that these, pan based scaffolding. Things help but, it really depends on the end task so you typically, want to if, you want to do very good multitasking, you typically want to learn structures, which are kind of similar but, there is nothing to say that if you learn, from different structures. You can't get. Anything I don't have experimental. Results, to show, that but it's, also possible, I. Was. Just curious if you've done, only, POS, tagging, as as, the other objective instead. Of constituencies. Yeah. So, that's a good question so there, has been related, work that does POS. Tagging, as. One, of the tasks, and this multitask, learning, scenario, we. Did not do, that because we, were. More. Interested in these. More. Complicated structural, tasks, but, people, have used POS, as one, of the maybe the auxilary, part, of the scaffolding. Testers work from your off goldberg and others who, have done similar things and saw improvements. But. Yeah that's also. Another alternative, right. You.