Wednesday, January 19, 2022

Google Play's developer support has become so terrible, it's own translation tools are causing apps to get flagged - Android Police - Translation

Independent developers have struggled with Google's Play Store developer support for years. The company claims to be doing more to make things better recently, with more actual human beings and fewer automated tools on the other end of those appeals, but it could still be doing much more in its role as the toll-taking gatekeeper for Android apps. Unfortunately for at least one developer, the company's reliance on automated tools has struck again: It looks like Google incorrectly flagged an open-source app because of its own reliance on machine translation as part of its new war against the word "free," and F-Droid's semi-official Nearby app also ran into trouble.

The first app in question is Catima, and it's a simple loyalty card and ticket manager. It's free, open-source, and popular within its niche — 10,000+ downloads and an impressive 5.0 rating with 139 reviews. It's still available on the Play Store and hasn't been taken down, but the developer behind it has documented a months-long saga involving the app's title and the use (or, more accurately, the non-use) of the word "free."

Previously, the app went by the name "Catima — The Libre Card Wallet" on the Play Store, with a similarly derived (and, we should note, seemingly human-translated) name in other markets. Open-source fans and technologists should understand the term "libre" in this context as the free-as-in-freedom intention behind open source software, with freely available code and the ability to modify it, as in the context of LibreOffice. The developer behind the project, Sylvia van Os, took pains to translate the title for each supported locale, with volunteers finding the correct analogs for "libre" in their respective languages. But Google apparently didn't return the favor.

135721623-9df9ad54-81f3-4dac-8410-658413e47b9e

Last October, she was informed out of the blue that her app had been "rejected" from the Play Store due to its Dutch and Norwegian titles, neither of which she tells us had changed for many months. A review of the recent policy changes led her to believe that Google might be misunderstanding the Dutch word "vrij" and its Norwegian brother, both of which mean "libre" for that free-as-in-freedom sense and not "free" as in no cost. See, Google decided it didn't like the word "free" in titles since they're usually attached to spammy, low-quality, bad apps, and Google's trying to clean up how its Play Store looks. And to be fair, you can see for yourself if an app is free. But why was Google flagging Catima when it doesn't use the word in that sense?

Sylvia suspected that Google's error came down to how it was performing its translations, which a later title review would seem to confirm — Google later objected to the word "free" when it wasn't even in the English title, showing that it was "translating" even that version, and doing it incorrectly. Playing around ourselves with some titles Google took objection to, it appears the company is relying on automated translations for its title reviews in other markets, with examples like the Dutch "vrije" and the German "freie" both clearly meaning free as in freedom, openness, and liberty, rather than free as in price. But this distinction is lost on automated translation tools like Google Translate, which go for a hard-and-fast word analog, ignoring the imprecision and multitudes of meanings for the word "free" in the English language.

This initial title review caused a cascade of issues for Sylvia after she rephrased the two not-actually-erroneous titles, with a flood of other seemingly machine-translated errors hitting her inbox over the next few days, as Google incorrectly took issue to words like "libre" and "libero," even later claiming that the English title used the word "free" when it never did. Clearly, Google was translating even the English app name's use of the word "libre" incorrectly.

Free as in Google Translate is free to misunderstand

A "bug" with the Play Store Console further compounds the issue, as the developer can't save titles for all language localizations unless all of them are below the new 30-character limit, and she has to wait on her Bulgarian Weblate translator volunteers to come up with a "fixed" version for Bulgarian that addresses Google's incorrect translation. (Humorously, the developer may have more human input on her title translations than Google employed.) Google even falsely rejected her app just a few days ago with a baseless claim that her app requires login credentials to review, even though it doesn't.

The issues are mostly resolved now, and Catima is available on the Play Store under a new title, but this is an all-too-familiar saga for independent developers dealing with Google's Play platform support. We reached out to Google for more information, as well as to explicitly confirm whether it's using machine translations of titles for rule enforcement rather than human translation, but the company didn't offer a response.

Catima isn't the only app that's caught Google's anger over the word "free," though. F-Droid, the popular and open source app repository (seen by some as a defacto Play Store alternative), says it's unable to promote its officially unofficial F-Droid Nearby App on Google Play, due to its use of the word "free." In this case, it's actually the word itself, but in the same libre-like context of open-source software, and it's not clear if Google understands the distinction.

The developers behind the app (who I'm told are tied to F-Droid even if the developer account doesn't have that branding, it's a long story) say they never received an email from Google about this issue, and just happened to notice it "by chance" in the Google Play Console — it wasn't even in the console's inbox. (Apparently, Google doesn't even notify developers for reduced search ranking for these sorts of offenses now.)

After covering Play Store support issues like these for years now, I can't help but be critical of Google's support and review process. Whatever lip-service it pays in blog posts and announcements, for all its excellent developer documentation and events like I/O, the folks actually running the Play Store simply refuse to make the investments necessary to provide a high-quality support experience that developers can rely on. Unless your name is big enough to merit special treatment, you're constantly at the whims of automated systems that fail to take into account the true granularity and gradients of any subjective review process. While we've raised enough of a stink in the past for issues like these to be resolved on a case-by-case basis, I sincerely doubt this is the last time I'll be writing one of these stories.

All Google needs to fix this is to spend a little of money on more warm bodies to justify the actual billions of dollars its making with its Play Store cut. For context, Apple has over 500 experts involved in its App Store reviews — though even people make mistakes.

Google caves to user backlash, fixes Assistant's white noise ambient sound

Phew

About The Author

Adblock test (Why?)

Precautions To Take When Translating Legal Documents - The Good Men Project - Translation

[unable to retrieve full-text content]

Precautions To Take When Translating Legal Documents  The Good Men Project

TAUS Launches Data-Enhanced Machine Translation - Slator - Translation

6 hours ago

TAUS Launches Data-Enhanced Machine Translation

Amsterdam, January 19, 2022 – TAUS, the one-stop language data shop established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise, launches a new service: Data-Enhanced Machine Translation (DEMT) on their Data Marketplace. 

MT customization essentially requires two elements: an MT engine and training data. By combining both into a single online service, DEMT offers an end-to-end solution to those who wish to produce customized MT output for their specific domains, without the hassle of going through the actual MT training process. Users can simply drop the file they would like to machine translate and select the datasets that they wish to be used in their customization. In the background, our technology processes the file through an Amazon Active Custom Translate integration by feeding the selected training dataset into the engine to produce a highly customized output. The translated file is then directly sent to the user’s inbox.

Generic MT engines are widely available. But to ensure that MT can handle domain-specific content well, proper customization is key,” says Jaap van der Meer, Director at TAUS. “With the TAUS DEMT service, we have made customized, affordable and high-quality machine translation accessible to anyone, regardless of their expertise or access to relevant training data.” 

The impact of the training datasets available for the DEMT service has been independently evaluated by Polyglot Technology LLC. “In total, we evaluated 8 language pairs for the E-Commerce domain, 18 language pairs for the Medical/Pharma domain and 4 language pairs for the Financial domain,” says Achim Ruopp, Owner at Polyglot Technology. “The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum. These are significant improvements that demonstrate the superiority of this customized translation for the E-Commerce, Medical/Pharma and Financial domains over non-customized MT outputs.”

The detailed analysis can be downloaded here. The library of available datasets for the DEMT service is planned to grow. You can try the TAUS DEMT service here.

About TAUS

TAUS was founded in 2005 as a think tank with a mission to automate and innovate translation. Ideas transformed into actions. TAUS has become the one-stop language data shop, established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise. We create and enhance language data for the training of better, human-informed AI services.

Our mission today is to empower global enterprises and their service and technology providers with data solutions that help them to communicate in all languages, faster, better, and more efficiently.

For more information, visit https://www.taus.net/ 

Adblock test (Why?)

Tuesday, January 18, 2022

‘Don’t Look Up’ is an important message lost in translation - Montana Kaimin - Translation

[unable to retrieve full-text content]

‘Don’t Look Up’ is an important message lost in translation  Montana Kaimin

‘Insurrection’ is a tale of two dictionaries, two Americas | The Grammarian - The Philadelphia Inquirer - Dictionary

2021: For the word insurrection, it was the best of times, it was the worst of times.

In 2022, we’re pretty much looking at just the worst of times.

A year after insurrectionists/seditionists sacked and looted the Capitol, too many Americans can’t agree on either the nouns or the verbs in that last phrase. We saw this last week, as the FBI’s arrest of 11 people on sedition charges caused Merriam-Webster lookups of the words sedition and seditious to spike 15,000%.

But it’s not just because these words are less common that so many are grappling with their definitions. It’s because there are deliberate efforts afoot to warp the definitions themselves.

ADVERTISEMENT

No wonder people are confused.

Though the word insurrection was common through much of the 19th century, its usage fell off a cliff after the Civil War, and never really recovered … until Jan. 6, 2021. Similarly, no one really cared about the word sedition after the U.K.’s Seditious Meetings Act of 1817 expired the following year (save for a small blip during the First World War when the U.S. passed the short-lived Sedition Act of 1918). Following a cicada-like once-a-century trend, sedition also spiked exactly one year ago.

But examine insurrection’s twisted journey in 2021 to see how nefarious actors try to reappropriate a word.

» READ MORE: Did Trump incite violence on Jan. 6? Depends which dictionary you use. | The Grammarian

ADVERTISEMENT

It was the best of times in that the word reentered our lexicon, maintaining a place in our vernacular for more than just a flash. Words get sad when they fall into disuse, and insurrection has maintained its comeback, with many mainstream publications using the word to describe what happened on Jan. 6.

But it was the worst of times when you look at how the word changed in 2021. For the first nine months or so, insurrection was most commonly searched alongside words like capitol, incitement, 25th amendment, Trump — words you’d expect. But starting around September, the Google hits changed. Then you started seeing search terms like legal insurrection and legal insurrection Kyle Rittenhouse spiking in their place. If you want to twist a word’s definition, start associating it with other, seemingly unrelated terms that play into your own pet conspiracy theories. LegalInsurrection.com is a hyper-right-wing blog site and tinfoil-hat factory — in its own words, “one of the most widely cited and influential conservative websites.”

Two months later, Tucker Carlson — who hosts the top-rated show on the most watched cable “news” network — aired his Patriot Purge series, which pushed the lie that Jan. 6, 2021 wasn’t an “insurrection” at all. It’s not just Carlson; all of Fox News spent much of 2021 downplaying Jan. 6, such as when the Senate released its bipartisan insurrection report in June, and, while most news outlets covered it extensively, Fox News largely ignored it.

Fast-forward to the last week of December, when a lowly Inquirer grammar columnist made a passing, inconsequential “insurrection” reference, which prompted multiple readers to respond with letters asking, “What insurrection?”

Merriam-Webster says insurrection is “an act or instance of revolting against civil authority or an established government” — which is exactly what happened on Jan. 6, 2021. Ditto the Oxford English Dictionary: “The action of rising in arms or open resistance against established authority or governmental restraint.” The debate here should be over. If anything, the term is too mild.

But when it comes to definitions, insurrection isn’t just a tale of two cities, or even two dictionaries; it’s a tale of two Americas.

The Grammarian, otherwise known as Jeffrey Barg, looks at how language, grammar, and punctuation shape our world, and appears biweekly. Send comments, questions, and sturdy indefensibles to jeff@theangrygrammarian.com.

Read more from The Grammarian

Now more than ever, you need to know these 9 phrases to avoid like the plague in 2022

Two little letters that could skew the Pa. Senate race

Biden’s ‘I’ll be darned’ packs a bigger punch than Trump’s F-bombs

Calling people ‘the unvaccinated’ could be a deadly shift in language

Adblock test (Why?)

Google Research Brings 'Massively Multilingual' Machine Translation to 200+ Languages - Slator - Translation

Google Research Brings ‘Massively Multilingual’ Machine Translation to 200+ Languages

Everything old is new again — including Google’s latest machine translation (MT) research. Co-authors Ankur Bapna, Orhan Firat, Yuan Cao, and Mia Xu Chen, who collaborated on a July 2019 paper presenting the culmination of five years’ work on a “massively multilingual” MT model, were joined this time around by Aditya Siddhant, Isaac Caswell, and Xavier Garcia.

Google’s January 2022 paper, Towards the Next 1,000 Languages in Multilingual Machine Translation, again takes up the cause of universal translation, addressing the challenge of scaling a massively multilingual model by training more parallel data. In addition to the prohibitive cost involved in collecting and curating parallel data for so many language pairs, this solution is typically unhelpful for many low-resource languages with limited data.

“Beyond the highest resourced 100 languages, bilingual data is a scarce resource often limited only to narrow domain religious texts,” the authors wrote. To build and train an MT model that covers more than 200 languages, Google researchers employed a mix of supervised and self-supervised objectives, depending on the data available for languages.

This “pragmatic approach,” as described by the authors, can enable a multilingual model to learn to translate effectively, even for severely under-resourced language pairs with no parallel data and little monolingual data. Moreover, they wrote, the results of their experiments “demonstrate the feasibility of scaling up to hundreds of languages without the need for parallel data annotations.”

Conceptually, the researchers explained, “one could think of this as monolingual data and self-supervised objectives […] helping the model learn the language and the supervised translation in other language pairs teaching the model how to translate by transfer learning.”

Pragmatic though it may be, the design is not new, ModelFront CEO and co-founder Adam Bittlingmayer told Slator, with “almost all competitive systems” now using some target-side monolingual data, even for major language pairs.

However, Bittlingmayer added, “it is in contrast to the recent publications from Facebook on this front.” For Facebook’s M2M-100, designed to avoid English as an intermediary between source and target languages, researchers manually created data for all pairs, while the social networking company snagged a November 2021 WMT win by focusing exclusively on translation to and from English.

Parallel or Monolingual Data?

The Google team performed two experiments, the first using parallel and monolingual data from the WMT corpus to train 15 different multilingual models, for 15 languages to and from English. Each model omitted parallel data for one language, simulating a realistic scenario in which parallel data is unavailable for all language pairs.

For each language, researchers then compared the performance of the “zero-resource model” (i.e., without parallel data) to a multilingual baseline trained on all language pairs using all parallel data available via the WMT corpus.

2021 M&A and Funding Report Product

Slator 2021 Language Industry M&A and Funding Report

Data and Research, Slator reports

46 pages on language industry M&A and venture funding. Includes financial investments, mergers, acquisitions, and IPOs.

For high-resource languages, this setup was able to match the performance of fully supervised multilingual baselines, but it was not enough to help the lowest-resource languages in the study (e.g., Kazakh and Gujarati) achieve high-quality translation. Adding monolingual data for those languages had a significant positive impact, improving translation quality above that of a supervised model.

“Even for high-resource languages, the method can achieve similar translation quality by leaving out parallel data entirely (for the language under evaluation) and throwing in 3–4 times monolingual examples, which would be easier to obtain,” the researchers wrote.

The team found that adding zero-resource languages in the same model diminishes performance across languages, while adding more languages with parallel data helps in all cases, since an unsupervised language learns something from each supervised pair. In the same vein, a lack of parallel data seems to be slightly more detrimental to translation quality, compared to a lack of monolingual data. 

Kenneth Heafield, Reader in MT at the University of Edinburgh, told Slator that these findings are not particularly surprising. “Using all the available data, parallel and monolingual, is usually best, provided it is clean,” he said, adding that of course, there are exceptions, such as extreme cases of domain mismatch: “Trying to translate software manuals when your only parallel data is the King James Bible is difficult.”

Beyond WMT

While high-quality, the WMT dataset is relatively small and covers a limited number of languages. To scale the model to cover more than 200 languages, the researchers conducted a second experiment, starting with a highly multilingual crawl of the web for monolingual and parallel data. 

They cleaned up the noisy dataset for the 100 lowest-resource languages to use for back translation. The cleaner version of the monolingual data was then translated into English, generating synthetic data for the zero-resource language pairs.

In this scenario, the authors wrote, “We find that xx→en and en→xx translation quality exhibit different trends.” 

Translation quality into English did not correlate well with the amount of monolingual data available for the non-English language; rather, the languages that performed well were typically those with similar languages in the supervised set. (In this context, the languages are not necessarily similar from a linguistic perspective, but have similar representations and labels learned within a massively multilingual MT model).

BLEU scores for English translation into other languages were high only for languages with high into-English translation quality, as well as relatively large amounts of monolingual data.

While the paper did not provide a timeline for when Google Translate users might benefit from this research, there is certainly widespread demand.

“On the product side, at this point, our median fellow human — more than four billion of us — is an Internet user and does not understand English. And there is a content explosion,” Bittlingmayer said. “So there is just a strong pull from the market, even if spend lags views.”

Adblock test (Why?)

Column: Translating UNC's COVID-19 communications - The Daily Tar Heel - Translation

Editor's Note: This article is satire.

We all know that as eloquent and articulate as his words can be, Chancellor Kevin Guskiewicz’s emails can be, well, hard to read. Our generation’s attention spans are rapidly decreasing, leaving in their wake a trail of impatience, boredom and a general distaste toward any written work that can’t be found on SparkNotes. 

But worry no more, there is no longer a need to put in an Adderall prescription every time another headache (but hopefully not cough, congestion or fever)-inducing COVID-19 update email is sent to the students and faculty of UNC. At The Daily Tar Heel, we have taken it upon ourselves to simplify these Shakespearean-esque soliloquies into a mere couple of sentences:

Rajee Ganesan
Snippet from general notice from UNC

Translation: Testing will be inaccessible so that the number of positive cases looks lower. If you do have COVID-19, you should probably figure it out yourself elsewhere.

Rajee Ganesan
Snippet from general notice from UNC

Translation: You need to isolate yourself if you are exposed to COVID-19, but we have gotten rid of quarantine dorms and have nowhere to send you. If you live anywhere that’s not North Carolina, I guess your roommate is in for a stressful next five to 10 days!

Rajee Ganesan
Snippet from general notice from UNC

Translation: You will very likely get COVID-19 this semester and have to miss class, but professors are not required to stream or record their lectures. Let’s hope you have friends in your classes, or else your GPA is going to drop faster than you can say the word “omicron.”

Rajee Ganesan
Snippet from general notice from UNC

Translation: Once again, it is your job to find a friend in your classes rather than the professor taking it upon themselves to give you a virtual version of the lesson. Oh, you don’t know anyone in your class? Why don’t you try speaking up in a breakout room of 15 strangers with their cameras off? Have you ever considered taking advantage of Zoom’s private messaging feature?

Rajee Ganesan
Snippet from general notice from UNC

Translation: Here, we are not so subtly flexing that we are the nation’s leader in infectious disease research, but are also likely soon to be the nation’s leader in infectious disease outbreaks on a college campus.

Rajee Ganesan
Snippet from general notice from UNC

Translation: Your compliance means everything to us.

Behind all of the three-syllable words and complex syntax is a complete lack of care for the students and faculty during one of the worst COVID-19 outbreaks yet.

I wish I could decipher even more from these emails, but after six hours of nonstop reading, I have a headache like a 7.0 magnitude earthquake. Unless ... I hope it’s not … I'd better get tested.

Has anyone read the emails well enough to know where I can do that?

@_hannahkaufman

opinion@dailytarheel.com

To get the day's news and headlines in your inbox each morning, sign up for our email newsletters.

SUBSCRIBE NOW

Adblock test (Why?)