Dragoș Ciobanu - Speech-enabled MT post-editing
Imagine you are working with another professional linguist who is taking the time very kindly to read your translations. Also perhaps read your post editing work and thus is helping you improve the quality of your deliverables. If you are working with such a person already, then probably this presentation only applies those times when your colleagues are going on holiday.
For everyone else, I guess the question is a lot more often if we don't have such a trusted colleague to work with, is there anything which artificial intelligence can do nowadays to help us improve the quality of our work? My name is Dragoș Ciobanu and this presentation is about speech-enabled machine translation post-editing. I started out as a technical writer and translator before I joined the academia, and I spent quite a lot of time working with language technologies and translation technologies in particular. It's really exciting to be talking about speech technologies nowadays because as we see in the industry, there is more and more of a focus on this side of technology, which is very welcome. For instance, in 2022 if you do a search on Slator, you will see a lot of attention dedicated to this field, a lot of case studies and reports about how various companies are improving their systems, or they're implementing third-party systems, so it's all really quite exciting. For those of you who aren't big fans of reading articles, but would rather hear people talking about various topics, again, 2022 has been a good year for speech technologies. Right now, at the time you're listening to this recording, the Translating Europe Forum has already taken place in Brussels, and there were quite a few sessions dedicated to this topic of using speech technologies more effectively by translators.
And again, depending on when you listen to this, watch also this presentation, this or may not have taken place. It's the Translating and the Computer Conference, TC44, where again, we will be talking a fair bit about the impact which speech technologies have on the work of professional linguists. So, as I was saying, I have been working with language technologies for a while, and right now I'm at the University of Vienna, leading the human artificial intelligence in translation research group. HAITrans, in short, where together with my colleagues, Alina and Miguel and our doctoral students Justus, Raluca and Claudia, we are looking into ways to help professional linguists take advantage of these new technologies, and also integrate them better into their existing workflows. So feel free to check us out when you have the time.
Among our priorities are the use of both, dictation or, automatic speech recognition by professional translators, but also speech synthesis, and I'll discuss these two elements very shortly. We also look at technology more broadly and how it supports the work of professional linguists and one of our latest projects focused on adapting your machine translation engines and studying their quality and their impact, and we focused on the medical domain. We also look at accessibility and how technology can enable that. And again, one of our latest projects involved studying audio introductions in theater, to theater plays, and seeing the kinds of biases that are present there.
And throughout our work, we are also committed to doing our very best to train the next generation of translators, and also advance the levels of machine translation, but also, translation technologies and language technologies in general, so, advancing the level of literacy in these areas too. So in this presentation, I'm going to go over the what, the why and the how of using speech technologies for post-editing. I'll start by making this distinction that just in case you haven't had much exposure to speech technologies in general between automatic speech recognition, otherwise called speech-to-text or simply dictation, which has been talked about a fair bit.
And another area which is speech synthesis, automatic speech synthesis, also known as text-to-speech, which involves a computer program reading out electronic text, which is on the screen, and it's this second area that I will focus most closely on. Why would you want to use speech technologies, in particular speech synthesis when you are post-editing, but also when you're doing other tasks such as translation or revision? It's mainly because of increased attention during post-editing. We all are really excited about the progress, which machine translation has registered lately, and it's very interesting to read new reports such as the Intento 2022 report and see that according to some experiments, in some cases, so by some, I mean in some language pairs, some domains, with some particular texts, up to 97% machine translation output did not require any human intervention.
But for the rest of us who are working with custom machine translation engines with specific requirements from clients, very strict style guides, requirements, also very strict terminology, sentences of various length and also language combinations, which are not really that well supported by machine translation, such reports are interesting to read, but they don't actually apply to what we see as much. So therefore we need more tools to help us deal with the machine translation output. And this is where speech technologies come into play, because essentially when we're asking professional linguists to perform machine translation post-editing, it's a bit like we're putting them in an autonomous vehicle, into a self-driving car, and we are telling them: "Trust the vehicle and only intervene when you notice a major error." But at the same time, this is extremely difficult and very stressful because we don't really know how to predict these errors, and they also appear, of course, when we least expect them and you can't really identify these errors without actually keeping your eyes on the road at all times, and so it's a really complex task, difficult one, which is unlike what the name would suggest, so post-editing implies that it's very easy, fast, quick, but unfortunately it's not that in a lot of cases.
So, just to give a few examples of the kinds of errors which post editors have to deal with, essentially these errors focus on aspects other than fluency because that's, that's the one of the biggest problems with neural machine translation output. The output is quite often, rather fluent, but it hides errors such as terminology errors. For instance, we're not talking about objectives when we're talking about the UN goals. We're talking about goals. Also, fluent output that actually contradicts what is said in the source text. So in this case, you'll have to take my word for it if you can't read Romanian.
The 2030 agenda does really not aim to eradicate both hunger, but also eradicate food security. So it's rather the opposite but the output is extremely fluent and such an error would be very easy to glance over. Other problems for machine translation are related to numbers and just to give a small example of decimal points, but when you're talking about billions, decimal points really matter. Or again prepositions, which do not count for much in, let's say an edit distance report, but actually changed the meaning of the sentence completely.
And again, it's very easy to glance over such errors because the sentence is fluent and the task is supposed to be completed, extremely quickly. So therefore, just reading the output might not be enough. Other kinds of errors introduced by machine translation engines can deal with or can concern names.
And again, thanks to my students, we have a list here of rather amusing, but actually serious errors produced by neural engines and even with fine-tuned neural machine translation engines, we still have errors to deal with. So again, there is a common belief that already neural machine translation is absolutely fantastic everywhere in every situation, and if we fine-tune it even more, it's going to just be absolutely amazing every single time, but in a recent study, which Alina, Miguel, Raluca and I conducted involving medical translations from English into Romanian, actually with, even with a fine-tuned engine, we saw a drop in the number of errors. These are just terminology-related errors, you know, forthcoming publication. So we saw fewer errors, but still they were present. So post-editing is not as easy as it seems and you really need to pay attention. So where did we get this idea that perhaps speech technologies will help with the post-editing task? We got this idea while investigating their impact on revision.
And we noticed and we published these findings in an article, again, you can access in your own time. So together with Valentina and Alina we had a project at the University of Leeds and we noticed that revisers of texts, which were also "spoken out", in inverted commas, by a synthetic voice, actually performed much better than revisers who just did the revision in the usual manner of, in silence with a keyboard, no extra technologies. So when sound was present we noticed that the revisers identified and corrected more accuracy errors and also more style, more fluency, but accuracy was the aspect which really stood out. So then we were wondering whether this would apply to other tasks.
In terms of the effect which speech technologies had on these participants in our revision study. As you can see on the screen, generally, they talk about the fact that the technology helped them to focus more on the text, ensure the logic is there, ensure the content is there, check for subtle content errors and even do some kind of prooflistening. And of course it's a change of working, working manner for some of, some linguists and, you should expect some disruption in the beginning, but it's something promising that we decided to investigate further.
So we conducted a study with students this time in our new HAITrans research group. So together with Claudia, Alina and Justus, we looked at post-editing in particular. And we talked about our findings at a couple of conferences, including the European Association for Machine Translation Conference earlier this year. And again, we noticed that although the first time our student participants started working with speech technologies when post-editing, so the first time they changed their way of, the normal way of working, there was a drop in the quality of their work, by the end, they were actually recording higher, a higher percentage of machine translation errors successfully corrected. So, these conditions are one condition was with speech only for the source target, for the source segment. Another condition was with synthetic speech for the target segment, and this final condition involved listening to both the source and the target segment while performing post-editing.
In terms of productivity, again it was interesting to see that compared to post-editing in silence, by the end of the experiment, the productivity rates had gone up despite maybe intuitively thinking that having to stop and listen to either the source or the target or both being spoken out might actually slow people down. We did not find that. And again, our participants in this study were quoting the fact that listening to either the source or the target, or both, were really helpful to detect errors more easily as opposed to just reading them and also for understanding the content in general, even content that at first glance might seem completely alien or rather alien if some of our participants, by listening to it started to get a better understanding of what it might actually mean.
These are interesting observations. So coming to the practical part we've seen what we are talking about and why, especially what our studies indicate as benefits. Some of you might be wondering, how do I go about testing such technologies? Because of course you should do your own testing rather than rely on whatever you hear around. There is some support for automatic speech recognitions and dictation, and also speech synthesis. If you are already using tools such as Dragon Naturally Speaking because within some CAT tools you can dictate and you can also ask Dragon to read the dictation back to you, so you could do that. The downside is that not a lot of languages are supported by such tools.
For those of you who work with Trados Studio there is a better implementation of text-to-speech, speech synthesis, which you can use for your translation, revision, post-editing, anything you do within a Studio project, and that is via an additional plugin which sits on top of your screen and will allow you to request the speech either for the source or for the target. So I'll just very quickly demonstrate this to you. So I'm in Studio and what I need to do is click on the file I want to open and request this TTS plugin. The plugin is connected. You have to set it up, you have to connect it to a Microsoft Azure account and once it's there you need to configure it to use a female or a male voice as you prefer, and to read either the source or the target because it does not do both at the same time.
So just to give you an idea of what the artificial voice for the English sounds like, let's have a listen. Migration shapes the society. And let's listen to the second segment as well.
In my research, I want to draw attention to the complexity of this phenomena. So that was the artificial voice for the source. Let's have a look at the target as well. And that's Romanian and it's neural machine translation output.
Migrația modelează societatea. And the second one as well. În cercetările mele, vreau să atrag atentja asupra complexităţii acestui fenomen. So even if you don't speak Romanian you probably have got the idea or the feeling that the result of these, these artificial voices is actually pretty nice and it's rather fluent.
The intonation is not terrible. So we are really far away from the robotic voices that some people still say are around. What else is available? There's also the option to use word processors such as Microsoft Word for instance, to read text out to you.
And again, let's have a little demo of artificial voices inside Microsoft Word to see how well it supports several languages. So I'm in Microsoft Word and I have a multilingual document just to demonstrate to you several languages. I click on Review. I click on Read Aloud. How translators and post-editors benefit from speech technologies, computer-assisted translation.
CAT tools, for the most part, are based on the traditional input modes of keyboard and mouse, according to a recent study. La reconnaisance vocale, future meilleure amie du traducteur? Los pro y los contra de los programas de reconocimiento de voz para "procesar texto". Un programa de reconocimiento de voz es una herramienta muy útil para un traductor, pero exige que estemos muy atentos a las palabras que escribe. Tastatura și mouse-ul au creat o adevărată revoluție în modul în care oamenii interacționau cu computerele.
Acum asistăm la un nou trend care devine din ce în ce mai popular printre utilizatorii de computere: utilizarea aplicaților lor de recunoaștere a vocii și a vorbirii pentru a-și controla dispozitivele. Ranking programów Text to Speech - najlepsze programy zmieniające tekst na mowę. Popularne ostatnio programy zmieniające tekst na mowę mogą dla niektórych być bardzo przydatnym narzędziem podczas codziennej pracy. Idealnie sprawdza się dla osób słabowidzących lub użytkowników, którzy wola słychać czytanych artykułów czy informacji niż lustrować je wzrokiem. So what you heard was live, automatic synthesis in English, then French, then Spanish, then Romanian, then Polish, produced by standard Microsoft read aloud plugin on the 9th of October, 2022.
And of course, because you can output or save or export files into a Microsoft or a rich text format format from most CAT tools, you could then conduct your experiments in Microsoft Word and see if they help you spot errors or turns of phrases or elements in your post-editing, as well as your translation or revision, if that's the task you are conducting, before you import the result back into your CAT tool and finish the project and deliver it to your client. So it's fairly straightforward to conduct such experiments and actually use such tools for production nowadays. So I mentioned Read Aloud and the play bar are present in Word. There are also the option or the resource, the option of using additional accessibility tools built in the operating systems that you might use because not everyone is working on Windows, not everyone is working with Office.
So again, there are those options there. And I will finish with a really exciting development that my team is very proud to be part of. We have been working together with Translated, who are the company who are maintaining and promoting Matecat as well, and the result of our collaborations is that soon Matecat will also integrate text-to-speech. We are the receivers of an imminent grant and what will happen is that in addition to the option of having dictation, inside Matecat, which has been available for a while now, the users will also be able to dictate, as I mentioned, through the...
this microphone icon, but listen to the source as well as the target if they want, in order to produce higher quality machine translation, post-editing or translation or revision depending on the tasks that they will conduct. So that's it for me. I hope this presentation gave you an... not an incentive, but motivated you
to try out speech technologies in your own work and see the impact that they bring. We would really be very excited to hear from you. We're always looking for stories, not only success stories, but also what didn't work is also of great importance to the community. We are also working on integrating translation and interpreting programs more so that our students are more uniformly exposed to all these technologies that will influence their careers later on. And I am also very keen to welcome more professionals taking part in our studies and generally that is because we find that there's a mix of attitude as well as software user-friendliness and the context in which we're working, which leads to the best examples of technology uptake and of generally sensible use of such technologies.
So we're looking forward to collaborating with education as well as industry representatives, and we already are in such collaborations through our partnerships. I thank you for your time. We'll stay in touch.
I don't know about you, but I've been loving learning from our speakers over the course of the summit. This is just a friendly reminder that if you don't have the Power Pack yet, do make sure to pick that up because it will only be available until a few days after the summit ends, and it includes replays of all of the presentations, the ability to listen again in a podcast and so much more.