On November 9, 2021, Airbnb announced that it had deployed Translation Engine, which allows users to automatically read translations of reviews and descriptions in more than 60 languages without having to click a translate button. Going against the current paradigm, the interface provides users with a see-original-language button instead.
Marco Trombetti, CEO of Translated — which has been working with the home-rental rental platform for the past three years and provided Airbnb with both human and machine translation — told Slator, “What is unique is the fact that, for the first time, the two are very symbiotic and integrated. Every single correction from the localization team improves the machine translation instantly.”
Airbnb runs on ModernMT, the Translated-led, open-source project, co-founded by Fondazione Bruno Kessler, the University of Edinburgh, and the European Commission. ModernMT is basically an adaptive neural machine translation system with a range of applications, including IP and life sciences translations.
“Translated initially provided the baseline pre-trained models [for Airbnb’s Translation Engine],” Trombetti said, which continuously improve based on corrections from the thousands of linguists who have been working on Airbnb content over the past years. As previously mentioned, Airbnb “human translated” more than 100 million words in 2019, pre-pandemic.
According to the Airbnb press statement, “Translation Engine improves the quality of more than 99% of Airbnb listings,” based on a study it commissioned with a machine translation evaluation company across the platform’s top 10 languages.
Trombetti said Airbnb commissioned custom evaluations on the platform’s content through “independent parties, not Translated.” However, he said the 99%+ quality improvement is in line with Translated’s internal evaluations. “Translated performs monthly assessments of our ModernMT models using our Airbnb qualified linguists,” Trombetti said.
He added that, while “many other companies experimented, pre-translation, with a small subset of their content, typically reviews, to my knowledge this is the first time it is done for all content and especially on this scale.”
He pointed out how site visitors will not only be able to read content in their own language, but also be able to find what was previously inaccessible to them. “It is not just about removing a button; it is about allowing everyone to explore in a new way,” Trombetti said.
UGC: Complex for AI
Asked about the challenge of culling data points from user-generated content (UGC) compared to training engines on content created by writers or professional linguists, Trombetti said, “UGC is complex for AI because everyone has a different style.”
It is not like training a custom model on a very narrow terminology” — Marco Trombetti, CEO, Translated
He explained that because UGC content is often written by non-native speakers and, most likely, by non-professional content writers, “there is a lot of flexibility that the AI needs to learn to translate well. It is not like training a custom model on a very narrow terminology.”
Trombetti added, “The indirect challenge with UGC is scale. Often UGC scale can be a million times bigger than content produced by localization teams; and the volume spikes are much more unpredictable.”
What’s more, he noted that 10x lower latency is also needed to be able to integrate machine translation into the production infrastructure. Therefore, “in human translation, engineering quality is really not an issue.” For machine translating UGC, however, “it is the critical asset.”
On top of that, there is the business element. The Translated CEO said, “When you manage UGC, you are a horizontal service. You need to interact with many divisions and stakeholders. So the level and complexity of discussions goes up. [Airbnb Head of Localization] Salvatore Giammarresi’s leadership, empathy, and his capacity to interact with the upper management made this all possible.”
No comments:
Post a Comment