Wednesday, January 31, 2024

How Large Language Models Fare Against 'Classic' Machine Translation Challenges - Slator - Translation

In a January 17, 2024 paper, a group of researchers from the University of Macau, University College London (UCL), and Tencent AI Lab explored the performance of large language models (LLMs) against “classic” machine translation (MT) challenges.

The six MT challenges, originally proposed by Philipp Koehn and Rebecca Knowles in 2017, include domain mismatch, amount of parallel data, rare word prediction, translation of long sentences, attention model as word alignment, and sub-optimal beam search.

For their experiments, the researchers used the Llama2-7b model, focusing on the German-to-English language pair. They explained that “English and German are high-resource languages in the Llama2 pretraining data, which ensures the model’s proficiency in these two languages.”

They found that LLMs reduce dependence on parallel data during pretraining for major languages and they improve the translation of long sentences and entire documents. Yet, challenges like domain mismatch and rare word prediction persist. Unlike neural MT models, LLMs face new challenges: translation of low-resource languages and human-aligned evaluation.

Document-Level

Specifically, the researchers found that LLMs mitigate reliance on bilingual data during pretraining for high-resource languages, with even a small amount of parallel data boosting translation performance. Surprisingly, an increased abundance of parallel data yields only marginal improvement and, in some cases, a decline in LLM translation system performance, challenging the common belief that more parallel data enhances translation quality. The researchers recommended supervised fine-tuning as a more advantageous approach for leveraging additional parallel data compared to continued pretraining.

The research community should “consider how to efficiently utilize parallel data for the enhancement of LLM translation systems, thereby offering a potential direction for future studies to optimize bilingual knowledge in the pursuit of improved MT performance using LLMs,” according to the researchers.

Another addressed challenge was the translation of long sentences, a significant hurdle for MT systems. LLMs demonstrated an ability to tackle this challenge effectively excelling in translating sentences with fewer than 80 words and consistently performing well at the document level with approximately 500 words. 

“LLMs excel in translating extended sentences and entire documents, underscoring their effectiveness as a promising solution for addressing challenges associated with long-sentence and document-level translation tasks,” they said.

Unresolved Challenges

The researchers explored whether the rich knowledge of LLMs could address domain mismatch in translation tasks. While LLMs showed robust performance in in-domain translation tasks, their progress in out-of-domain tasks was modest, encountering challenges like terminology mismatch, style discrepancies, and hallucinations.

Predicting rare words in the realm of LLMs remains another significant challenge, leading to omissions in translations. The researchers underscored the persistent and unresolved nature of this issue, emphasizing its significance in the field.

Mixed Results

Word alignment, involving the identification of word pairs with similar semantic information in a given translation pair, was also explored. The researchers tested the feasibility of extracting word alignment from LLM attention weights, revealing that it was not a viable option. Despite this, the process provided valuable insights into model interpretability, they said.

In the context of inference, two major issues are inference strategies — including beam search and sampling — and inference efficiency due to the abnormal size of LLMs, as the researchers explained. They first tested the performance difference of beam search and sampling and they found that beam search is not necessarily suboptimal in LLMs.

In terms of inference efficiency, they found that LLMs require an average of 30 seconds compared to the 0.3 seconds of MT models, raising concerns about real-time deployment in scenarios requiring fast translation. “The longer inference time of LLMs may impede their real-time deployment in scenarios where fast translation is required,” they said.

10 LLM Use Cases (Main Title)

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

New Challenges

Besides these six “classic” MT challenges, they identified two new challenges within the realm of LLMs. One pertains to the translation quality for language pairs inadequately represented during the pretraining stage and the other involves evaluating translation quality.

The researchers found that translation performance is significantly affected by the available resources for each language, emphasizing the need for a diverse and balanced dataset during the pretraining of LLMs to ensure equitable performance across languages.

Evaluation issues have also come to the forefront. They tested the quality of LLMs using both automatic — BLEU and COMET — and human evaluation metrics and found a moderate negative correlation between them. This emphasizes the importance of combining both evaluation methods and indicates that current metrics may not fully capture the nuances appreciated by human evaluators.

According to the researchers, this calls for further research to develop and refine evaluation methods aligned with human preferences, especially as language models become more complex and capable. “This human-centered approach to evaluation will be crucial in ensuring that our translation models are not only technically proficient but also practically useful and acceptable to end users.” they said.

Finally, the researchers called for future research to focus on refining evaluation methods and testing approaches on more advanced models.

Authors: Jianhui Pang, Fanghua Ye, Longyue Wang, Dian Yu, Derek F. Wong, Shuming Shi, and Zhaopeng Tu.

Adblock test (Why?)

Tuesday, January 30, 2024

Paris to use AI translation app for Olympics visitors - Yahoo News - Translation

STORY: (English) "How can I go to the Stadium of France?"

(Arabic): "How can I get to Stade de France?"

(Korean): "How do I get to the Olympic opening ceremony?"

As Paris welcomes the world for the Olympics this summer, the city’s public transport system, known as RATP will be using artificial intelligence to help thousands of international visitors navigate through the capital.

The handheld Tradivia device can translate between French and 16 different languages including Mandarin, Arabic and Korean, with text appearing on a screen as well as being read out loud.

The RATP will provide more than 3,000 agents with this device, ready to assist all international visitors.

RATP representative Gregoire de Lasteyrie:

"The goal for us in Ile-de-France Mobilite is for them to travel in the best possible conditions, and therefore, being able to speak to them in as many languages as possible and helping them find their way in Paris is extremely important."

Metro workers, like Raphael Gassette, say the device gives them more confidence.

“We no longer have this fear of thinking, 'We're not going to understand each other.' Here, we know straight away, with regard to the languages, to press straight on 'Hindi,' and immediately have clear, more precise information, and we can be sure that when the visitor leaves, they are satisfied."

The service will remain in Paris after the Games, which will be held from July 26 to Aug. 11.

Adblock test (Why?)

American Institutes for Research Honored by Anthem Awards for Knowledge Translation Expertise - Yahoo Finance - Translation

Anthem Awards 2024 Winner

Anthem Awards 2024 Winner
Anthem Awards 2024 Winner

Arlington, Va., Jan. 30, 2024 (GLOBE NEWSWIRE) -- The American Institutes for Research (AIR) has won two Anthem Awards for its work operating the Model System Knowledge Translation Center (MSKTC), which translates health information into easy-to-understand language and formats for people living with spinal cord injury (SCI), traumatic brain injury (TBI), and burn injury and their families and caregivers. The Anthem Awards, presented by the International Academy of Digital Arts and Sciences, recognizes innovation in the work of mission-driven organizations committed to amplifying social impact causes that spark global change.

The MSKTC was recognized in the community engagement and public service categories for its innovative approach to engaging Spanish speaking audiences through the translation and dissemination of pertinent health information to meet the needs of individuals with traumatic injuries. MSKTC’s Spanish language resources earned gold recognition in the nonprofit sector for best-in-class Public Service health project and a silver honor in the category of best-in-class Community Outreach project.

“We are honored that our work to support people living with SCI, TBI, and burn injuries is resonating across communities and our broader society across the globe,” said Xinsheng “Cindy” Cai, principal researcher at AIR who directs the MSKTC. “We will continue our commitment to overcoming language barriers and ensure the latest research findings are being used in health care decision-making.”

As a federally funded national center, the MSKTC works closely with medical professionals from the Model System centers, who conduct innovative and high-quality research and provide multidisciplinary rehabilitation care to meet the information needs of individuals living with SCI, TBI, and burn injury by identifying health information needs, summarizing research, and developing and disseminating information resources. Both the MSKTC and Model System centers are funded by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), Administration for Community Living, U.S. Department of Health and Human Services.

Since 2021, the suite of digital resources on the MSKTC website has received more than 7.1 million pageviews from 4.1 million visitors globally, reaching Spanish-speaking countries, including Mexico, Spain, Columbia, and Argentina. In total, the project has reached 240 countries, reflecting AIR’s strong commitment to evidence-based learning and increasing equitable access to education and health services.

As a winner of the Anthem Awards, AIR is in the company of other distinguished organizations, including AARP, the CDC Foundation, the Johns Hopkins Bloomberg School of Public Health, and others.

Launched in 2021 by The Webby Awards, the Anthem Awards honors the purpose and mission-driven work of people, companies and organizations worldwide. Social impact projects are evaluated across seven core causes: Diversity, Equity, and Inclusion; Education, Art, and Culture; Health; Human and Civil Rights; Humanitarian Action and Services; Responsible Technology; and Sustainability, Environment, and Climate. This year, more than 2,000 entries spanning 30 countries were submitted.

About AIR   
Established in 1946, the American Institutes for Research (AIR) is a nonpartisan, not-for-profit institution that conducts behavioral and social science research and delivers technical assistance both domestically and internationally in the areas of health, education, and the workforce. AIR's work is driven by its mission to generate and use rigorous evidence that contributes to a better, more equitable world. With headquarters in Arlington, Virginia, AIR has offices across the U.S. and abroad. For more information, visit www.air.org.

Attachment

CONTACT: Dana Tofig American Institutes for Research 202-403-6347 dtofig@air.org

Adblock test (Why?)

Jimmy Failla's new book 'Cancel Culture Dictionary' puts spotlight on the outrage era plaguing society - Fox News - Dictionary

Jimmy Failla’s book "Cancel Culture Dictionary" hit stores on Tuesday, putting a spotlight on the outrage era plaguing society. 

"Cancel Culture is the only movement where the biggest winners are a bunch of losers," Failla told Fox News Digital. 

"It's populated by people who spend all day on social media looking for something to get offended by so they can leverage the world's outrage into their ‘likes,’" he continued. "As the book shows, nothing has been improved in the outrage era -- crime is higher, test scores are lower, and we're all a lot fatter despite what the Instagram filters show you."

JIMMY FAILLA'S 'CANCEL CULTURE DICTIONARY' AIMS TO HELP AMERICANS WIN THE WAR ON FUN

Failla book

Jimmy Failla’s book "Cancel Culture Dictionary" hit stores on Tuesday, putting a spotlight on the outrage era plaguing society.  (FOX)

"Cancel Culture Dictionary: An A to Z Guide to Winning the War On Fun," is the latest offering from Fox News Books. Failla, a former New York City taxi driver, said the purpose of the project was to simply show how cancel culture has "broken our compass." 

He also believes cancel culture has gotten to the point where things aren't being canceled "in the name of progress," but rather for power or personal gain. He cited everything from backlash to comedian Dave Chappelle, the vanishing of syrup icon Aunt Jemima and a school declaring that Abraham Lincoln didn’t prove that Black lives matter to him as some of the most egregious examples of cancel culture gone wrong. 

CANCEL CULTURE IS GETTING CANCELED AND IT'S ABOUT TIME

Failla feels that people don’t know the difference between a joke and a hate crime these days, but hopes the latest offering from Fox News Books can help right the ship. 

"This book is my attempt to get society back on track. And yes, I'm aware of just how bad things have gotten if a former New York City cab driver who plays video games in his 40's is now the voice of reason," Failla said. 

"In short, this book may not save the world," he continued. "But if you like reading at a third-grade level you'll still be glad you bought it."

ORDER ‘CANCEL CULTURE DICTIONARY’ HERE

Jimmy Failla

"Cancel Culture Dictionary: An A to Z Guide to Winning the War On Fun" is available now. (FOX News Books)

Failla previously called the book "a step-by-step guide to how everybody can live their life in a way, you know, that will really recalibrate society."

"It's not a call to arms. It's a call to chill the f--k out," Failla said. 

"Cancel Culture Dictionary" is available now. 

CLICK HERE TO GET THE FOX NEWS APP

Adblock test (Why?)

Monday, January 29, 2024

A New Translation Unveils The Dark Side Of The Travels Of Marco Polo - Worldcrunch - Translation

ROME — When I am in charge of a translation, the first question I ask myself is not about the text itself, but how to approach the work.

For example, when I embarked on translating Travels in the Congo, the travel diary by French author André Gide, I decided to experiment with an idea I had had for some time: to try and reproduce the original French version in Italian. The book's language was sometimes contracted and fragmentary, sometimes stretched out into lyrical slashes or even soaring in invective. So my mission was difficult: I needed not only to translate it in the most faithful way possible, but in the most corresponding way too, almost as if I was copying it, creating an imitation rather than a translation.

This experiment — of course it will be the readers who will assess how successful it was — came to me out of a growing dissatisfaction with the fact that good, even excellent translations may completely lose the syntactic, grammatical relationship to the original version.


I love languages both for their lexicon than for their structure, for the way they order words in a precise sequence. The notion that a translated text should not "feel" translated, but on the contrary it should turn out as if it had been written in the target language in the first place, is useful for those who translate a lot and work with books of wide circulation. When dealing with refined and rare texts, in my opinion, it is necessary to aim for a more radical approach.

Marco Polo's story

And here we come to Marco Polo's Il Milione (The Million), better known in English as The Travels of Marco Polo, which I have recently translated into modern Italian, a language that even fifteen-year-olds can read.

Il Milione (according to the most credited hypothesis, the title derives from Emilione, the nickname of the Polo family) was penned by Rustichello da Pisa, a modest 13th-century novelist who wrote in the Franco-Venetian language, at the dictation of Marco Polo the explorer.

They both found themselves prisoners of the Genoese, and they both had a desire to put to good use that time of enforced confinement and inactivity. Marco Polo is for all intents and purposes the source and creator, but the language and the hand are Rustichello's.

Beauty and money, discovery and interest, travel and enrichment, pomp and war.

This happened precisely at the end of the 13th century. The original manuscript was soon lost due to the many copies that were made: a case of disintegration due to too much success, fatal at the time of technical non-reproducibility. There are therefore several copies, threaded (how faithfully?) from the original text, and a translator, in the first place, must select one.

My choice fell on the codex that, according to scholars, is the closest to the original, the manuscript named Fr. 1116 in the National Library in Paris. Fr. 1116 has specificities that are not at all original: breaks, interruptions, abrupt suspensions, even lacunae and omissions that, far from making it shoddy or shapeless, characterize it and make it singularly modern.

An illustration of Kublai Khan by Évrard d'Espinques for "The Travels Of Marco Polo."

Wikimedia

Earthy language to unveil

It is so modern that the Tuscan version of the fourteenth century, by a translator who worked on a codex very similar to the manuscript preserved in Paris, perceived an uneasiness in it, and considered rounding off the ending with a spurious epilogue that guaranteed a circularity and a happy ending. That spurious ending, which in my translation I have obviously taken care not to repeat, was widely considered part of the original story.

Il Milione is a text that is indeed marvelous (in the literal sense of the word) and that justifies the exhilarating and smooth rereadings or transpositions that insist on exoticism, on the titanic figure of Kublai Khan and the astonishing imperial palace of Shangdu, on flora and fauna seemingly out of the dawn of time. But at the same time it is also a book shot through with a pragmatism, a brutality that is absolutely concrete, in which pages and pages are devoted to trade, money, and war. Taut, concise pages, in which every word counts, for example in the detailed description of paper money in use by the Mongols.

As this harsh, ironclad side of Il Milione came to light, my translation took on burnished tones, moving as far apart from the other available modern Italian translations, which willingly yield to the one-sided image of Il Milione as a soft, precious silk cloth embroidered with legends.

And while the beauty and finesse of the artifacts are omnipresent in the text, the truth is that the original Marco Polo never tires of mentioning these objects' economic value, price, and the lavish earnings they guarantee. Beauty and money, discovery and interest, travel and enrichment, pomp and war.

And perhaps it was providential that Marco Polo, in his Genoese captivity, met not a sublime poet, but the earthy Rustichello. There were however legions of translators who provided embellishing, sugarcoating, and smoothing, which I tried to stop at the gates of the citadel.

From Your Site Articles

Related Articles Around the Web

Adblock test (Why?)

From Japan to the world: How to translate a game - Japan Today - Translation

Behind the global success of Japanese video games lies a delicate task: appealing to overseas players whose expectations on issues such as sexism are increasingly influencing the content of major titles.

With the majority of sales for big games now outside Japan, everything from slang words to characters' costumes must be carefully considered for a global audience.

It is a complex process that has come a long way since the "Wild West" of the 1980s and 90s, one high-profile "localization" team told AFP.

"There were no rules, no 'industry standards', and the quality of localisation could vary greatly from one title to the next," said the SEGA of America team who worked on "Like a Dragon: Infinite Wealth" -- the latest title in the hit "Yakuza" series, which was released on Friday.

Back then, translators faced constraints including too-small text boxes, and sometimes game developers did the job themselves in less-than-perfect English.

It also meant that many games from the era, especially dialogue-heavy ones, never made it out of Japan.

"Thankfully, the industry -- and perhaps more importantly consumers -- have changed a lot since those days, and we are now able to be more faithful to the cultural and emotional content of Japanese games than ever," the SEGA team said.

Localization is now integral to the design process, with international gamers in mind from the start.

One key example is "how Japanese game developers dress their heroines" as the #MeToo movement changes mindsets, said Franck Genty, senior localization manager at Japanese game giant Bandai Namco.

"We tell them that the cleavage is a bit too exposed, or the skirt is a bit too short," he told AFP. "Before, they weren't very flexible, but they've become more proactive on such subjects."

The puzzle of game localization affected the 1980 arcade sensation "Pac-Man", with the direct translation "Puck Man" deemed too risky because it could be vandalised.

Some top-selling games including Mario, Final Fantasy and Pokemon involve fantasy worlds that are not overtly Japanese, offering some flexibility for their adaptation.

But the task becomes trickier for series such as "Yakuza", which are set in real-life locations and use slang from Japan's underworld. Getting it right is important: around 70 percent of revenue from recent titles in the "Yakuza" series is from overseas.

But in recent years, booming interest in manga comics, anime cartoons and wider Japanese culture has made the job easier. "People know what ramen is now... we don't need to say 'noodles' any more," Genty said.

His team at the European headquarters of Bandai Namco has adapted games including the "Tekken" fighting series and the smash-hit role-playing game "Elden Ring" into a dozen languages.

The job is as much a cultural challenge as a linguistic one, said Pierre Froget, localization project manager at Bandai. "The player, whichever country they're from, should understand and feel the same thing as someone playing in the original language," he said.

A better understanding of Japanese culture among players means adaptations can be more subtle -- the "Yakuza" series is now called "Like a Dragon", closer to the original Japanese.

LGBTQ caricatures and sexist cliches have also been axed.

"Many representations which were normal in Japan in the first 'Like a Dragon' games are no longer acceptable today," Masayoshi Yokoyama, the series' executive producer, told AFP. "We ask our teams in the United States and Europe to read the game's script, and they tell us if they see things that wouldn't be acceptable in their country."

Changes often focus on "alcohol, politics or religion", Froget said, while cultural reference points also differ.

"When there are people dressed in black boots and big leather coats, in Europe that could bring to mind a Nazi uniform," he said.

With global release dates now the norm, these decisions must be made under tighter deadlines than before.

And despite improved communication between developers and localization teams, challenges remain -- especially when translating a game into languages other than English.

"Efforts have been made to understand the needs of the English-speaking world," Froget said.

But for German, which has longer sentences and other linguistic quirks, localization is sometimes "seen as an extra difficulty" by design teams.

Even so, Froget believes in his mission: "To create connections to Japanese culture and help Europeans discover its depth, while respecting both the game and the player."

© 2024 AFP

Adblock test (Why?)

We tested Galaxy S24′s real-time AI translation: it works, sort of - 조선일보 - Translation

[unable to retrieve full-text content]

We tested Galaxy S24′s real-time AI translation: it works, sort of  조선일보