Monday, October 11, 2021

The Trouble of Translation: Squid Game - Honi Soit - Translation

Adblock test (Why?)

The Intento 2021 State of Machine Translation Report - Your Cheatsheet to the MT Landscape - Yahoo Finance - Translation

SAN FRANCISCO, Oct. 12, 2021 /PRNewswire/ -- Intento, the leading AI integration platform, has released its annual State of Machine Translation report, giving those working in and around the MT landscape an in-depth analysis of the current vendors and best strategies to successfully leverage their offerings. The report is conducted in collaboration with TAUS, the central authority in language data, offering the largest industry-shared repository of data, deep know-how in language engineering, and a network of Human Language Project workers around the globe. You can download the report here.

The 2021 edition delivers everything you need to know to choose the best-fit MT engines for your language pair and industry sector. It provides:

  • The performance of different MT engines across 7 industries (Education, Finance, Healthcare, Hospitality, Legal, Entertainment, and General) and 13 language key pairs.

  • The latest data on 24 commercial MT engines (Alibaba eCommerce and General, Amazon, Apptek, Baidu, DeepL, Elia, Globalese, Google, GTCom, IBM Watson, Microsoft, ModernMT, Naver, Kawamura / NICT, Pangeanic, PROMT, Rozetta, Systran, Tilde, Tencent, Yandex, Youdao, and XL8)

  • Along with 5 open-source pre-trained models (M2M-100-1.2B, M2M-100-418M, mBART50-EN2M, mBART50-M2M, and OpusMT)

  • The principal scores to rely on when studying MT outcomes, such as similarity scores (COMET, BERTScore, PRISM, TER, and hLEPOR)

  • A thorough comparison of scores: hLEPOR, BERTScore, PRISM, and COMET.

  • Coverage of language support, which jumped from 16k to 100k language pairs in 2021

  • Price comparisons

This year's report is chock-full of novel insights and will consist of two parts. First, 'Automatic semantic similarity scoring' demonstrates various changes to the MT landscape over the past year, including information on all new players on the market.

The second part will provide a deep-dive linguistic analysis for 3 language pairs (EN → ES, EN → IT, EN → NL). Essential takeaways from this breakdown include:

  • The comparison of texts between 5 industry sectors: Education, Financial, Healthcare, Legal, Travel (ES).

  • Key conclusions on how automatic metrics relate to human estimation of translations.

  • Recommendations on the best-fit MT engines for analyzed language pairs and industry sectors.

  • Insights on how to enhance the power of all MT engines available on the market.

Intento is trusted by global companies to help select, deploy, and improve the best-fit machine translation and other AI services, including sentiment analysis, voice synthesis, image tagging, and optical character recognition. The report aims to provide an expert vision of the constantly-changing MT landscape to save internationally-facing businesses both human and financial capital. A deep comprehension of the MT landscape benefits your company no matter your experience in machine translation, as there are significant insights for implementing AI and machine translation across various departments to boost productivity and growth. Download the report here.

"Working with MT is like living on an erupting volcano. We had 16,000 language pairs available from 34 MT providers just a year ago, and today it's about 100,000 from 46. We don't have datasets to evaluate them all, but by working with TAUS we get a look into 13 language pairs and 7 domains," says Konstantin Savenkov, Intento CEO. "The level of quality we see from stock models in 2021 is unprecedented. However, real-world business applications demand even more, and simply knowing the best stock model is not enough to succeed with MT. Make sure you have domain adaptation, glossaries, tone of voice control, and other tools on your belt."

Savenkov continues, "This year, together with TAUS, we had a particular focus on using high-quality domain-specific data. It took more time to prepare, but the results should be relevant to a wider audience and applicable to more use cases than before. One key highlight we see from this year is the emergence of new semantic similarity metrics, such as COMET."

"The availability of high-quality, domain-specific language data powering MT models has become ever so significant as AI-enabled automatic translation becomes more and more common. We are pleased to have offered test datasets in 13 language pairs and 7 domains to Intento to be used in their State of the MT 2021 Report. We believe the findings will shed light on many use cases providing guidance on which MT engines are best suitable for users' given requirements and above all demonstrate the value of high-quality, domain-specific data in increasing the quality of the final output." Jaap van der Meer, TAUS Director

About Intento:

Intento, the leading AI integration platform, helps global companies utilize the best-fit cognitive AI services and automate content creation (text synthesis), content transformation (between text, speech, and image), and content localization (machine translation), enabling enterprises to translate 20x more content on their existing budgets.

This year, Intento was recognized as a 2021 Cool Vendor in Conversational and Natural Language Technologies by Gartner for its success in enhancing the supply chain of the global translation business. The Intento AI Hub gives global corporations direct, easy access to a multitude of MT engines (such as Amazon, Google AutoML, or Microsoft Cognitive Services) and seamlessly connects them with all of their business systems.

Launched in 2017, Intento offers its patented, ISO-27001 and ISO-9001 certified platform to global companies across all industries, augmenting their Localization, Content Management, Customer Support, and Marketing Operations with AI. For more information, visit https://inten.to.

About TAUS:

TAUS was founded in 2005 as a think tank with a mission to automate and innovate translation. Ideas transformed into actions. TAUS has become the one-stop language data shop, established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise. We create and enhance language data for the training of better, human-informed AI services.

Our mission today is to empower global enterprises and their service and technology providers with data solutions that help them to communicate in all languages, faster, better, and more efficiently. For more information, visit https://www.taus.net/.

Contact:
James Hjerpe
james.hjerpe@inten.to

SOURCE Intento

Adblock test (Why?)

Microsoft taps AI techniques to bring Translator to 100 languages - VentureBeat - Translation

Join gaming leaders online at GamesBeat Summit Next this upcoming November 9-10. Learn more about what comes next. 


Today, Microsoft announced that Microsoft Translator, its AI-powered text translation service, now supports more than 100 different languages and dialects. With the addition of 12 new languages including Georgian, Macedonian, Tibetan, and Uyghur, Microsoft claims that Translator can now make text and information in documents accessible to 5.66 billion people worldwide.

Its Translator isn’t the first to support more than 100 languages — Google Translate reached that milestone first in February 2016. (Amazon Translate only supports 71.) But Microsoft says that the new languages are underpinned by unique advances in AI and will be available in the Translator apps, Office, and Translator for Bing, as well as Azure Cognitive Services Translator and Azure Cognitive Services Speech.

“One hundred languages is a good milestone for us to achieve our ambition for everyone to be able to communicate regardless of the language they speak,” Microsoft Azure AI chief technology officer Xuedong Huang said in a statement. “We can leverage [commonalities between languages] and use that … to improve whole language famil[ies].”

Z-code

As of today, Translator supports the following new languages, which Microsoft says are natively spoken by 84.6 million people collectively:

  • Bashkir
  • Dhivehi
  • Georgian
  • Kyrgyz
  • Macedonian
  • Mongolian (Cyrillic)
  • Mongolian (Traditional)
  • Tatar
  • Tibetan
  • Turkmen
  • Uyghur
  • Uzbek (Latin)

Powering Translator’s upgrades is Z-code, a part of Microsoft’s larger XYZ-code initiative to combine AI models for text, vision, audio, and language in order to create AI systems that can speak, see, hear, and understand. The team comprises a group of scientists and engineers who are part of Azure AI and the Project Turing research group, focusing on building multilingual, large-scale language models that support various production teams.

Z-code provides the framework, architecture, and models for text-based, multilingual AI language translation for whole families of languages. Because of the sharing of linguistic elements across similar languages and transfer learning, which applies knowledge from one task to another related task, Microsoft claims it managed to dramatically improve the quality and reduce costs for its machine translation capabilities.

With Z-code, Microsoft is using transfer learning to move beyond the most common languages and improve translation accuracy for “low-resource” languages, which refers to languages with under 1 million sentences of training data. (Like all models, Microsoft’s learn from examples in large datasets sourced from a mixture of public and private archives.) Approximately 1,500 known languages fit this criteria, which is why Microsoft developed a multilingual translation training process that marries language families and language models.

Techniques like neural machine translation, rewriting-based paradigms, and on-device processing have led to quantifiable leaps in machine translation accuracy. But until recently, even the state-of-the-art algorithms lagged behind human performance. Efforts beyond Microsoft illustrate the magnitude of the problem — the Masakhane project, which aims to render thousands of languages on the African continent automatically translatable, has yet to move beyond the data-gathering and transcription phase. Additionally, Common Voice, Mozilla’s effort to build an open source collection of transcribed speech data, has vetted only dozens of languages since its 2017 launch.

Z-code language models are trained multilingually across many languages, and that knowledge is transferred between languages. Another round of training transfers knowledge between translation tasks. For example, the models’ translation skills (“machine translation”) are used to help improve their ability to understand natural language (“natural language understanding”).

In August, Microsoft said that a Z-code model with 10 billion parameters could achieve state-of-the-art results on machine translation and cross-lingual summarization tasks. In machine learning, parameters are internal configuration variables that a model uses when making predictions, and their values essentially — but not always — define the model’s skill on a problem.

Microsoft is also working to train a 200-billion-parameter version of the aforementioned benchmark-beating model. For reference, OpenAI’s GPT-3, one of the world’s largest language models, has 175 billion parameters.

Market momentum

Chief rival Google is also using emerging AI techniques to improve the language-translation quality across its service. Not to be outdone, Facebook recently revealed a model that uses a combination of word-for-word translations and back-translations to outperform systems for more than 100 language pairings. And in academia, MIT CSAIL researchers have presented an unsupervised model — i.e., a model that learns from test data that hasn’t been explicitly labeled or categorized — that can translate between texts in two languages without direct translational data between the two.

Of course, no machine translation system is perfect. Some researchers claim that AI-translated text is less “lexically” rich than human translations, and there’s ample evidence that language models amplify biases present in the datasets they’re trained on. AI researchers from MIT, Intel, and the Canadian initiative CIFAR have found high levels of bias from language models including BERT, XLNet, OpenAI’s GPT-2, and RoBERTa. Beyond this, Google identified (and claims to have addressed) gender bias in the translation models underpinning Google Translate, particularly with regard to resource-poor languages like Turkish, Finnish, Persian, and Hungarian.

Microsoft, for its part, points to Translator’s traction as evidence of the platform’s sophistication. In a blog post, the company notes that thousands of organizations around the world use Translator for their translation needs, including Volkswagen.

“The Volkswagen Group is using the machine translation technology to serve customers in more than 60 languages — translating more than 1 billion words each year,” Microsoft’s John Roach writes. “The reduced data requirements … enable the Translator team to build models for languages with limited resources or that are endangered due to dwindling populations of native speakers.”

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member

Adblock test (Why?)

Sunday, October 10, 2021

26 Korean Words Added to English Dictionary | HYPEBAE - HYPEBAE - Dictionary

With the rise of South Korea‘s influence on music, entertainment, food and more, the Oxford English Dictionary (OED) has now been updated with 26 Korean words.

The country’s popular culture has risen to global fame thanks to Bong Joon-Ho‘s award-winning Parasite, K-pop groups BTS and BLACKPINK and most recently, Netflix‘s Squid Game. The fashion industry is looking to Korea for some of the most exciting up-and-coming designers, while beauty fanatics are stocking their vanities with K-beauty products. Recognizing the Korean wave (also known as hallyu), the OED has added dozens of entries to its vocabulary.

Standouts include K-drama, which is defined as “a television series in the Korean language and produced in South Korea.” A batch of dishes have also been added, including chimaek (fried chicken served with beer), galbi (beef short ribs, usually marinated in soy sauce, garlic, and sugar, and sometimes cooked on a grill) and bulgogi (thin slices of beef or pork which are marinated then grilled or stir-fried).

Elsewhere, hanbok — the traditional Korean costume typically worn on formal or ceremonial occasions — has been introduced, as well as aegyo, defined as a kind of “cuteness or charm, esp. of a sort considered characteristic of Korean popular culture.” Mukbang, which has become a significant category in the world of YouTube, has also made it to the list.

Scroll down to see the full list of newly added and updated Korean words in the OED.

aegyo
banchan
bulgogi
chimaek
daebak
dongchimi
fighting
galbi
hallyu
hanbok
Hangul
japchae
K-drama
kimbap
Kono
manhwa
mukbang
noona
oppa
PC bang
samgyeopsal
sijo
skinship
taekwondo
Tang Soo Do
unni

Read Full Article

Adblock test (Why?)

'Squid Game' is the latest example of when subtitles are a little off - NPR - Translation

Netflix's Squid Game is a huge hit, but some say its subtitles are inaccurate. Podcast host Youngmi Mayer and translation professor Denise Kripper explain why things got lost in translation.

LULU GARCIA-NAVARRO, HOST:

If you haven't already watched "Squid Game," you have probably heard about it. Netflix's new survival drama is set in South Korea, and its premise is not a happy one.

(SOUNDBITE OF MUSIC)

GARCIA-NAVARRO: Each of its players are deep in debt, but if they win, they'll have enough prize money to pay those loans off. The catch - losing costs you your life.

(SOUNDBITE OF TV SHOW, "SQUID GAME")

UNIDENTIFIED ACTOR #1: (As character, non-English language spoken).

GARCIA-NAVARRO: "Squid Game" is yet another example of how Korean media is dominating the global market, but some viewers have noticed its English subtitles are a little off.

YOUNGMI MAYER: I'm Youngmi Mayer. I am a comedian based in New York City, and I'm also the co-host of "Feeling Asian" podcast.

GARCIA-NAVARRO: Youngmi Mayer is fluent in Korean. And while watching "Squid Game," she noticed that the show's English captions didn't quite reflect what the characters were actually saying. She took her thoughts to TikTok - where else would you take this? - along with a scene from the show.

(SOUNDBITE OF TV SHOW, "SQUID GAME")

UNIDENTIFIED ACTOR #2: (As character, non-English language spoken).

MAYER: Translation says, oh, I'm not a genius, but I can work it out. What she actually said was, I am very smart. I just never got a chance to study.

GARCIA-NAVARRO: And because of those inaccuracies, Youngmi Mayer says that audiences may not understand the show's cultural references.

MAYER: That is a huge trope in Korean media - the poor person that's smart and clever and just isn't wealthy.

GARCIA-NAVARRO: Now, translating subtitles for TV can be tricky. There are rules.

DENISE KRIPPER: There's space limitation that you have to keep in mind.

GARCIA-NAVARRO: Denise Kripper is a translation scholar and assistant professor at Lake Forest College in Illinois. She also has experience translating TV shows from English into Spanish.

KRIPPER: Translation in subtitles is usually two lines, and there's a certain number of characters that you cannot pass.

GARCIA-NAVARRO: There's also trying to fit it all within the constraints of character limits and scene speed. But Kripper says there's another challenge, one that's far trickier. Languages have different structures and different metaphors, so it can be really hard to accurately convey meaning. Jokes can be especially difficult, like this scene she had to translate from the sitcom "Friends."

KRIPPER: Chandler is waiting for the phone to ring, to hear from some woman, I think.

GARCIA-NAVARRO: Meanwhile, Ross and Phoebe are doing a crossword.

(SOUNDBITE OF TV SHOW, "FRIENDS")

DAVID SCHWIMMER: (As Ross Geller) Four letters - circle or hoop.

MATTHEW PERRY: (As Chandler Bing) Ring, damn it, ring.

SCHWIMMER: (As Ross Geller) Thanks.

(LAUGHTER)

GARCIA-NAVARRO: Denise Kripper says an exact translation of the scene doesn't really work in Spanish.

KRIPPER: To ring - the phone to ring is one word in Spanish, and a ring that you can wear on your fingers - a totally different one. So, again, this is a lot to work with for such a short amount of time, right?

GARCIA-NAVARRO: Kripper says in cases like this, the translator may have to change the dialogue in a scene rather than translate word for word and leave viewers confused. Youngmi Mayer says she knows translators are limited in what they can do but worries viewers who rely on subtitles when they watch fast-paced shows like "Squid Game" are getting short-changed.

MAYER: It just seems like maybe this is a time for us to just pause and rethink and restructure the old way of translations. Those metaphors and deep, like, very intelligently written ideas and ideologies that the writer's trying to express to us - they're getting literally just taken out of the script because the translation can't translate that in real time.

GARCIA-NAVARRO: In the meantime, "Squid Game" fans who want to have a fuller understanding of the context of the show can do like Youngmi Mayer did when she watched "Breaking Bad" - or, really, any of us who enjoy parsing episodes of any series - and just pick up a smartphone, Google and never miss a pop culture reference again.

(SOUNDBITE OF JUNG JAEIL'S "UNFOLDED..."

Copyright © 2021 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by Verb8tm, Inc., an NPR contractor, and produced using a proprietary transcription process developed with NPR. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.

Adblock test (Why?)

English translation of iconic Bangla kids' collection Thakurmar Jhuli - The Tribune - Translation

Sutapa Basu’s translation of Thakurmar Jhuli, an iconic work of children’s literature written more than 100 years ago by Dakshinarajan Mitra Majumdar has been released recently. The book has been published under Readomania’s children’s imprint, Reado Junior.

 Dakshinaranjan Mitra Majumdar collected folktales from villages and towns across Bengal and rendered them into a unique Bengali collection of children’s fiction, titled Thakumar Jhuli. Enjoyed by children over the ages, the anthology became synonymous with the cultural heritage of the region.

Basu’s translation promises to take readers to an enchanting land sprinkled with flying horses, speaking birds, cunning foxes, indestructible monsters, bold princes, and even bolder, beautiful princesses.

The book reflects the region’s cultural heritage in its semi-realistic illustrations and icons reminiscent of the rice-paste alpona patterns, a familiar sight at all auspicious occasions in Bengal.

The translator says, "This edition subtitled Princesses, Monsters and Magical Creatures is a translation and not a transliteration. I intended to and have adhered strictly to the original narrative. Nevertheless, a few adjustments have been unavoidable, primarily due to differences in linguistic nuances between the two languages."

Sutapa Basu is a best-selling author. Her latest book, The Curse of Nader Shah won the Best Fiction Award by AutHer Awards, 2020 instituted by JK Papers and The Times of India.

Adblock test (Why?)