Tuesday, July 5, 2022

A Recipe for Better Machine Translation - Slator - Translation

We are still missing out on the full opportunities of machine translation despite seventy years of research. This article is an appeal to everyone involved in the translation ecosystem to come off the fence and realize the full benefits of MT and how to utilize MT-centric translation strategies. We can do better!

Today, most MT is sourced from the big tech companies such as Amazon, Google and Microsoft. They are the driving force behind the industrialization of MT with the scale and the capital to develop the massive models. 

Disturbingly enough, the massive MT models are black boxes. Even the researchers who train them can’t pinpoint exactly why one performs better than the other. The model work is glamorous and cool, but the intellectual insight that would allow us to reproduce bugs and remove them is hard to get. To get models to work in production, data engineering is more vital than research. Well-executed data engineering can bring in the nuances that are required for robust performance in a real-world domain. The issue, however, is that most researchers like to do the model work, not the data work, as also pointed out in the Google Research article titled Data Cascades in High-Stakes AI. 

Customization has become inherent in many MT platforms allowing users to upload translation data and handle their own data engineering. These features, however, as TAUS found out, require a lot of experimentation and experience.* In-domain training data have unpredictable, often low, and sometimes even negative impact on the performance of the engines. It seems that the big tech companies treat their customization features as stop-gap measures for the time it takes until human parity is reached. Five to ten years? 

To support and facilitate the industrialization of MT, the big tech MT developers can do better. This is how:

1. Don’t bet the future entirely on the brute force of the massive models 

2. Improve your customization features to better support your business customers in building production-ready engines.

MT Users

Although nothing spectacular or revolutionary took place in the past few years, the adoption of MT has still increased. The MT engines are simply plugged into the existing workflows to be used as complementary sources for translation matches. Translators see their tasks shifting more and more into post-editing. The new technology is used primarily to help the business drive for continuous efficiency gains and lower word rates, very much so in the tradition of thirty years of leveraging translation memories.

Blue-sky thinking is what we miss in the translation industry overall. Apart from a few start-up innovators, a defensive approach towards MT technology is adopted by most of the actors in the translation industry. The result is a general negative sentiment with emphasis on cost reductions, compromises in translation quality, disruption in the workforce and pessimistic perspectives on the industry’s future. The problem is that we are all so deeply rooted in our traditions, we can’t see through the present.

MT technology can be a force multiplier for those operators in the translation industry that are capable of shifting from a defensive to a proactive approach.

To support and facilitate the industrialization of MT, MT users, LSPs and enterprises can do better. This is how:

1. Focus on data engineering. Do not accept that the quality output of, among others, the Amazon, Google, Microsoft and Systran engines is as good as it can get. Significant improvements can be made using core competencies such as domain knowledge and linguistic expertise.

2. Design end-to-end MT-centric workflows. Do not think of MT as just an add-on to your current process and workflow but make it the core of new solutions serving new customers, translating content that was never translated before.

3. Provide new opportunities for linguists. Post-editing is not the end-game. Create new perspectives by leveraging intellectual insights for better automation.

TAUS Recipe for Better MT

TAUS has been an industry advocate for translation automation since 2005. We have developed a unique recipe for better MT as outlined below.

1. Evaluate

The first step in every MT project is to measure and evaluate the translation quality. Most MT users are just measuring and comparing the baseline engines. TAUS takes the evaluation a step further. We train and customize different MT engines and then select the engine with the maximum achievable quality in the customer domain. See TAUS DeMT™ Evaluate.

2. Build

The second step is the creation of in-domain customer-specific training datasets, using a context-based ranking technique. Language data are sourced from the TAUS Data Marketplace, from the customer’s repositories or created on the Human Language Project platform. Advanced automatic cleaning features are applied. See TAUS DeMT™ Build.

3. Translate

The third step is then generating the improved machine translation. Improvements demonstrated show scores between 11% and 25% over the baseline engines from Amazon, Google and Microsoft. In many cases, this brings the quality up to levels equal to human translation or post-edited MT. Some customers refer to DeMT™ Translate as ‘zero-shot localization’, meaning that translated content goes directly to customers without post-editing. TAUS offers DeMT™ Translate via an API to LSPs and enterprises as a white-label product. 

* MT customization features require a lot of experimentation and experience. See TAUS DeMT™ Evaluation Report and contact a TAUS expert to learn how to best work with MT customization. 

Adblock test (Why?)

Saturday, July 2, 2022

Opinion: Mind your P'S & Q's, tomahtos-tomaytos - Economic Times - Dictionary

A few Sundays ago, upon reading a column that appeared on this page ('The Sycophant in the Room: The World Is Even Flatterer Than It Seems,' bit.ly/3xZX9gn), I learnt then that I had, all this while, been mispronouncing the word sycophancy. Instead of 'see-co-fancy, I had been saying 'sy-co-fancy' - like psychology. The discovery was mildly amusing in the sense of, 'Huh, I wonder how no one noticed it till now?'

In my younger years, something like this would have been calamitous. Pronunciation was a weapon in high school -- used to attack the lack of sophistication of any poor soul that uttered an English word incorrectly. With the benefit of all the intervening years, and with exposure to many non-English speakers who have attained distinction in their chosen profession in English-speaking countries, I understand now that improper pronunciation is not really a reflection of anything - other than the inability to remember the confusing rules, or the unwillingness to conform to the demands of the language.

In most instances, it lays bare the inconsistent structure of the language. 'It's not me, it's you,' seems to be an apt response to the English Language at every pronunciation stumble, a turnaround of the classic relationship-ender line. Another tangential validation comes from William Strunk Jr, a Cornell University English professor whose claim to fame is his 1920 textbook, The Elements of Style. Strunk is remembered for exhorting his students: 'If you don't know how to say a word, say it out loud!'


Any missive about the inconsistencies in English spelling and pronunciation is incomplete - at least for a Hindi movie fan - without reference to the immortal dialogue from the 1975 Hrishikesh Mukherjee movie Chupke Chupke. In that unforgettable scene, the multiple-degree-wielding rich old man is made hapless by this question from the young professor who is masquerading as an uneducated driver: 'If T-O is pronounced as 'too' and if D-O is pronounced as 'doo' then why isn't G-O pronounced as 'goo'?' - which is, of course, 'poo' in Hindi.

The absurdity of the English language is recognised even by its keeper of its rules -- the dictionary. The Guide to Pronunciation section (online version) of the Merriam-Webster dictionary has this to say: 'For some languages, such as Spanish, Swahili, and Finnish, the correspondence between orthography and pronunciation is so close that a dictionary need only spell a word correctly to indicate its pronunciation. Modern English, however, displays no such consistency in sound and spelling, and so a dictionary of English must devote considerable attention to the pronunciation of the language.'

It does acknowledge the frustration of the user: '[T]his disparity between sound and spelling is just a continual nuisance at school or work.' One marvels at the amount of energy the keepers of the English language must expend to prop up all the contradictions in pronunciation with labyrinthine rules that require the good graces of the brain's memory centre as much as its seat of language.

The one segment that benefits is the industry that makes money off 'teaching' the proper use of English -- 'spoken English' centres, writing guides, grammar-checking services. There's even a kid's competition in the US which is considered a national institution: the annual Spelling Bee. The first prize is $50,000, and the past 20 out of the 25 winners have been of Indian origin. Do immigrant parents, carrying with them the burden of memories of their own high-school pronunciation mishaps, train their kids to be perfect spellers the way others make their children pursue music or sports?

Experts on communication extol the virtues of clarity, brevity, and conviction. What irony that the tool most sought after to achieve these ideals is as unwieldy as the English language. Recently, a friend loaned me Jared Diamond's book, Guns, Germs, and Steel, which argues that one major cause of western Europe's dominion over the globe was its long history of complex societal structures that were necessitated by the agrarian economy. Could the ability to hold a civilisation together with a language that has head-scratching rules be one of the reasons as well?

The writer is managing director,

Laboratories, Bengaluru

Adblock test (Why?)

Friday, July 1, 2022

How to build a language: inside the Oxford English Dictionary - Audio Long Reads - The New Statesman - Dictionary

The New Statesman’s Pippa Bailey has long had a professional as well as a personal interest in the OED: she and the team of sub-editors she leads rely on the world’s most comprehensive dictionary to answer questions of meaning and spelling. So it was a labour of love when she visited its Oxford HQ to meet the lexicographers whose decisions – about which words are added, revised, or rendered obsolete – help shape the world’s most-spoken language.

In this richly researched and beautifully observed deep dive, Bailey charts the course of the dictionary from its mid-19th-century origins to its most recent “new words” update (“terf”, “stealthing” and “sportswashing” were among the June 2022 inclusions). She visits the archive and hears from the specialists hard at work on the dictionary’s third edition – a job that began in 1994 (and the OED is still only halfway revised). Should they trace the first written use of “burner phone” to The Wire, or further back to a 1996 rap by Kingpin Skinny Pimp? Should they add the phrase “very traffic”? And why is it so hard to tell the origin story of “bucket list”?

This article first appeared on newstatesman.com on 22 June and in the magazine on 24 June 2022. You can read the text version here.

Written by Pippa Bailey and read by Emma Haslett.

You also might enjoy listening to How the trial of the Colston Four was won by Tom Lamont.

Podcast listeners can subscribe to the New Statesman for just £1 a week for 12 weeks using our special offer. Just visit newstatesman.com/podcastoffer.

Sign up for The New Statesman’s newsletters Tick the boxes of the newsletters you would like to receive.

A weekly newsletter helping you fit together the pieces of the global economic slowdown.

Quick and essential guide to domestic and global politics from the New Statesman's politics team.

The New Statesman’s global affairs newsletter, every Monday and Friday.

The best of the New Statesman, delivered to your inbox every weekday morning.

The New Statesman’s weekly environment email on the politics, business and culture of the climate and nature crises - in your inbox every Thursday.

Our weekly culture newsletter – from books and art to pop culture and memes – sent every Friday.

A weekly round-up of some of the best articles featured in the most recent issue of the New Statesman, sent each Saturday.

A newsletter showcasing the finest writing from the ideas section and the NS archive, covering political ideas, philosophy, criticism and intellectual history - sent every Wednesday.

Sign up to receive information regarding NS events, subscription offers & product updates.

How to listen to Audio Long Reads

1. In podcast apps

Audio Long Reads is available to listen on all major podcast players, including Apple Podcasts, Spotify, Google Podcasts, YouTube and more.

Either click the links above to open in your preferred player, or open the podcast app on your device and search for “Audio Long Reads”.

Follow or subscribe in your podcast app to receive new episodes as soon as they publish.

2. On the New Statesman website

The podcast is also available to listen right here on the New Statesman website. Bookmark https://ift.tt/dAW1Q2i, where we will publish new episodes every Saturday morning.

3. On your smart speaker

If you have an Amazon Echo, Google Home or Apple HomePod, ask it to “play the latest episode of Audio Long Reads from the New Statesman”.

The command will also work on other smart devices equipped with Alexa, Google Assistant or Siri.

Content from our partners
Transport is the core of levelling up
The forgotten crisis: How businesses can boost biodiversity
Small businesses can be the backbone of our national recovery

Adblock test (Why?)

Google Chrome set to expand translate options to include selected text, more languages - 9to5Google - Translation

Google Translate makes it easier to peruse the web without running into the language barrier, but sometimes the all-or-nothing approach of Google Chrome’s built-in Translate integration can be the wrong way to handle it. Now, Google Chrome is preparing the ability to translate selected text.

As spotted by Leopeva64 on Reddit, Google Chrome is currently testing out a new ability to translate “partial” text on a webpage. Specifically, it will be able to translate text highlighted on the page by the user.

Today, Chrome has built-in translation, but it applies to the page as a whole, rather than just bits on the page. This generally works out fine, but may not be ideal for pages that only have certain portions in another language.

It also only works in a single language, translating pages from whatever language is displayed to the user’s preferred, but single language.

A new “bubble” UI in Google Chrome appears in the omnibox (address bar) and can translate text as it is selected on the page. This can be accessed either by pressing that button in the omnibox, or right-clicking the text and pressing “translate to.” The new UI also has an option to “translate full page.”

Right now, this isn’t a functional feature, as it doesn’t actually translate the text, but we can see where Google is going with this.

google translate partial chrome bubble

The other big perk to this revamped translation experience is that you can more easily switch between languages. As it stands today, changing the translation language requires digging into settings and adding more languages.

With this new experience, Google Chrome will present alternate languages front and center, with a full list of everything Google Translate has to offer in a scrolling list.

This feature is live – but not functional, as mentioned – in the latest Chrome Canary update. It should make its way to stable builds over time, but it remains to be seen exactly when.

More on Google Chrome:


Check out 9to5Google on YouTube for more news:

Adblock test (Why?)

Legal Settlement Requires Chicago To Offer Translation Services To Parents Of Students With Disabilities - Block Club Chicago - Translation

CHICAGO — It took Maggie Przytulinski seven years to get her younger brother, Mark, the help he needed in school. 

Przytulinski said Mark, who hasautism, Down syndrome, and is non-verbal, had an Individualized Education Program, or IEP, a legally binding document that outlines the services for students with disabilities. It requires multiple meetings every year and a significant amount of legal paperwork. 

Adding to the complexity? Przytulinski’s Polish-speaking mother knows only basic English. 

Przytulinski, who speaks both English and Polish, said she often found herself taking on the role of translator in the IEP process. 

“It was difficult because not only I had to fight for the service for my brother and place him in the appropriate placement,” Przytulinski said, “but at the same time, I had to translate for my mom.”

That should no longer be the case, thanks to a legal settlement reached earlier this month between Chicago Public Schools and a group of families, including Przytulinski’s. It will guarantee language interpretation services to the families of students with disabilities. The Illinois State Board of Education reached a similar settlement late last year. 

The settlement mandates Chicago Public Schools provide language translation services for non-English speaking parents at all IEP meetings, which are required by federal law for students with disabilities who are receiving services.

Under the settlement, CPS also agreed to hire 10 full-time certified interpreters or translators, five of whom will serve only in those roles and provide translated versions of documents including reports, evaluations, and recommendations within 30 days of IEP meetings. 

Parents can also request the interpreter not be a part of the IEP team. It can be difficult for interpreters to be impartial if they are playing a dual role, which makes the provision a critical component of the settlement, said Olga Pribyl, program vice president of the special education rights clinic at Equip for Equality. 

CPS does not comment on settlements, but is committed to ensuring that the needs of students with disabilities are met and will continue to work with ISBE, staff, and parents to provide the best educational experience and opportunities, according to a district spokesperson.

Equip for Equality has been working on the challenges faced by families for years. Pribyl said the advocacy group tried to resolve the problem directly with the state and district. 

“We alerted both the State Board of Education and Chicago Public Schools about our concerns with language access issues before we filed the complaint,” Pribyl said. “We had discussions with them, so it’s something that we were hoping we could resolve collaboratively.”

In January 2018, Equip for Equality and Kirkland & Ellis filed a class-action lawsuit on behalf of more than a dozen families, including Przytulinski’s.  

Grant Jones, litigation associate at Kirkland & Ellis LLP, represented the families. The case was personal for him, he said, as a former sixth grade English teacher and the child of an immigrant whose first language was not English. 

Jones said it took time and creativity to structure a settlement that all parties could agree on. But in the end, he said, he felt everyone had the same goal: to help students and their families access the services they needed to meaningfully participate in IEP meetings.

For the next two years, CPS must file reports each semester to show they’ve complied with the settlement. ISBE will also propose regulations to make sure qualified interpreters and translated documents are provided in districts across the state.

Although federal law requires family members be included in IEP meetings, it does not specify details such as which documents must be translated for families.  

“It’s hopefully some precedent that we can set to maybe make changes even further outside of Chicago and Illinois,” Jones said.

The case was dismissed without prejudice, meaning that if CPS and the ISBE do not comply, the plaintiffs can bring the lawsuit back, Jones said. Pribyl encouraged families who do not receive language interpretation and translation services to contact Equip for Equality. 

Przytulinski’s brother is now 22. Mark attended CPS schools from preschool until age 14, when he transitioned to a private day school for students with similar disability-related needs, according to Amanda Klemas, senior attorney in the special education rights clinic at Equip for Equality. Przytulinski said she and her mother had to push the district to allow him to attend the therapeutic day school, where CPS covered tuition – a service the district sometimes provides for students with severe disabilities.

She said she’s happy to know it will now be a little easier for families like hers to get the help they need. 

Eileen Pomeroy is a reporting intern for Chalkbeat Chicago. Contact Eileen at epomeroy@chalkbeat.org.

Chalkbeat is a nonprofit news site covering educational change in public schools.

Adblock test (Why?)