Friday, May 27, 2022

Translation, the new (benign) colonialism - Economic Times - Translation

Apart from being a language, English is also the most politically empowered language. The latest demonstration of this fact is the far wider recognition Geetanjali Shree has gained as a writer after being awarded this year's International Booker Prize. This prize, a companion award to the more popular Booker Prize given to works of fiction written only in English, is given to works in non-English languages translated into English. What makes it valuable, apart from the reach the host language (English) brings to the work written in the guest language (in this case, Hindi), is the push (read: branding) the prize itself bestows. Remember, the translation has to be published in Britain or Ireland to qualify. Shree's 2018 novel Ret Samadhi, translated into English by (American) Daisy Rockwell and published in 2021 as Tomb of Sand, brings the book - with the stamp of 'Booker aesthetics' excellence - to a non-Hindi-reading readership that includes Indian readers.

For English to wield the kind of 'visa-less' access it has today, it required a consistent overt and covert political push over centuries. Britain's colonial enterprise, of course, was a primal source and fuel of that longue duree proliferation. By the time the language of the British Empire was unhitched from the language itself, it had welded itself across the world both as lingua franca and high language of choice, with its regional manifestations (British English, American English, Indian English, etc). A language's reach is determined by its source users' economic and political heft. So, for a language to thrive, both in stature and reach, beyond its native terrain, the source society must have the commensurate heft.

With colonisation, thankfully, no longer an option, it is translation that now holds the key. Translating quality contemporary literature in Indian languages into other Indian languages, English, Mandarin, Spanish, etc - languages whose readers are appreciative of literature - can bring the soft power and prestige that native language speakers crave for.


Adblock test (Why?)

Dharamsala: Tibetan dictionary released - The Tribune India - Dictionary

[unable to retrieve full-text content]

Dharamsala: Tibetan dictionary released  The Tribune India

Taltan Dictionary Project aims to preserve endangered Indigenous language - CFTKTV - Dictionary

[unable to retrieve full-text content]

Taltan Dictionary Project aims to preserve endangered Indigenous language  CFTKTV

Thursday, May 26, 2022

Translate scanned PDF documents with Document translation - Microsoft - Translation

Today, the Document translation feature of Translator, a Microsoft Azure Cognitive Service, adds the ability to translate PDF documents containing scanned image content, eliminating the need for customers to preprocess them through an OCR engine before translation.

Document translation was made generally available last year, May 25, 2021, allowing customers to translate entire documents and batches of documents into more than 110 languages and dialects while preserving the layout and formatting of the original file. Document translation supports a variety of file types, including Word, PowerPoint and PDF, and customers can use either pre-built or custom machine translation models. Document translation is enterprise-ready with Azure Active Directory authentication, providing secured access between the service and storage through Managed Identity.

Translating PDFs with scanned image content is a highly requested feature from Document translation customers. Customers find it difficult to segregate PDF documents which have regular text or scanned image content through automation. This creates workflow issues as customers have to route PDF documents with scanned image content first to an OCR engine before sending them to document translation.

Document translation services now have the intelligence

  • to identify whether the PDF document contains scanned image content or not,
  • to route PDFs containing scanned image content to an OCR engine internally to extract text,
  • to reconstruct the translated content as regular text PDF while retaining the original layout and structure.

Font formatting like bold, italics, underline, highlights, etc. are not retained for scanned PDF content as OCR technology does not currently capture them. However, font formatting is preserved while translating regular text PDF documents.

Document translation currently supports PDF documents containing scanned image content from 68 source languages into 87 target languages. Support for additional source and target languages will be added in due course.

Now it’s easier for customers to send all PDF documents to Document translation directly and let it decide when and how to use the OCR engine efficiently.

For customers already using Document translation, no code change is required to be able to use this new feature. PDF documents with scanned content can be submitted for translation like any other supported document formats.

We are also pleased to announce that the Document translation adds support for scanned PDF document content with no additional charges to customers. Two pricing plans are available for Document translation through Azure — the Pay-as-you-go plan and the D3 volume discount plan for higher volumes of document translation. Pricing details can be found at aka.ms/TranslatorPricing.

Learn how to get started with Document translation at aka.ms/DocumentTranslationDocs.
Send your feedback to mtfb@microsoft.com.

Adblock test (Why?)

AppTek Achieves Top Ranking at the International Workshop in Spoken Language Translation's (IWSLT) 2022 Evaluation Campaign - PR Newswire - Translation

Company's Spoken Language Translation System Ranks First in Isometric Speech Translation Track Which Is Critical in Improving Automatic Dubbing and Subtitling Workflows

MCLEAN, Va., May 26, 2022 /PRNewswire/ -- AppTek, a leader in Artificial Intelligence (AI), Machine Learning (ML), Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), Text-to-Speech (TTS) and Natural Language Processing / Understanding (NLP/U) technologies, announced that its spoken language translation (SLT) system ranked first in the isometric speech translation track at the 19th annual International Workshop on Spoken Language Translation's (IWSLT 2022) evaluation campaign.

Isometric translation is a new research area in machine translation that concerns the task of generating translations similar in length to the source input and is particularly relevant to downstream applications such as subtitling and automatic dubbing, as well as the translation of some forms of text that require constraints in terms of length such as in software and gaming applications.   

"We are thrilled with the results of the track," said Evgeny Matusov, Lead Science Architect, Machine Translation, at AppTek. "This is a testament to the hard work and skill of our team, who have been focusing on developing customized solutions for the media and entertainment vertical."

AppTek entered the competition to measure the performance of its isometric SLT system against other leading platforms developed by corporate and academic science teams around the world.  Participants were asked to create translations of YouTube video transcriptions such that the length of the translation stays within 10% of the length of the original transcription, measured in terms of characters. AppTek participated in the constrained task for the English-German language pair, which is the one out of the three pairs evaluated at IWSLT with the highest target-to-source length ratio in terms of characters count.

Submissions were evaluated on two dimensions – translation quality and length compliance with respect to source input. Both automatic and human assessment found the AppTek translations to outperform competing submissions in terms of quality and the desired length, with performance matching "unconstrained" systems trained on significantly more data. An additional evaluation performed by the task organizers showed that creating synthetic speech from AppTek's system output leads to automatically dubbed videos with a smooth speaking rate and of higher perceived quality than when using the competing systems.

"The superior performance of AppTek's isometric speech translation system is another step towards delivering the next generation of speech-enabled technologies for the broadcast and media markets", said Kyle Maddock, AppTek's SVP of Marketing. "We are committed to delivering the state-of-the-art for demanding markets such as media and entertainment, and isometric translation is a key component for more accurate automatic subtitling and dubbing workflows."

AppTek scientists Patrick Wilken and Evgeny Matusov will present the details of AppTek's submission at this year's IWSLT conference held in Dublin on May 26-27, 2022.

The full IWSLT 2022 results can be found here.

About AppTek
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies.  The AppTek platform delivers industry-leading, real-time streaming and batch technology solutions in the cloud or on-premises for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek's multidimensional 4D for HLT (human language technology) solutions with slice and dice methodology covering hundreds of languages/dialects, domains, channels and demographics drive high impact results with speed and precision.  For more information, please visit http://www.apptek.com.

Media Contact:
Kyle Maddock
202-413-8654
[email protected]

SOURCE AppTek

Adblock test (Why?)

Wednesday, May 25, 2022

Lewis County Elks Dictionary Project - Lewis Herald - Dictionary

[unable to retrieve full-text content]

Lewis County Elks Dictionary Project  Lewis Herald

Translate scanned PDF documents with Document translation - Microsoft - Translation

Today, the Document translation feature of Translator, a Microsoft Azure Cognitive Service, adds the ability to translate PDF documents containing scanned image content, eliminating the need for customers to preprocess them through an OCR engine before translation.

Document translation was made generally available last year, May 25, 2021, allowing customers to translate entire documents and batches of documents into more than 110 languages and dialects while preserving the layout and formatting of the original file. Document translation supports a variety of file types, including Word, PowerPoint and PDF, and customers can use either pre-built or custom machine translation models. Document translation is enterprise-ready with Azure Active Directory authentication, providing secured access between the service and storage through Managed Identity.

Translating PDFs with scanned image content is a highly requested feature from Document translation customers. Customers find it difficult to segregate PDF documents which have regular text or scanned image content through automation. This creates workflow issues as customers have to route PDF documents with scanned image content first to an OCR engine before sending them to document translation.

Document translation services now have the intelligence

  • to identify whether the PDF document contains scanned image content or not,
  • to route PDFs containing scanned image content to an OCR engine internally to extract text,
  • to reconstruct the translated content as regular text PDF while retaining the original layout and structure.

Font formatting like bold, italics, underline, highlights, etc. are not retained for scanned PDF content as OCR technology does not currently capture them. However, font formatting is preserved while translating regular text PDF documents.

Document translation currently supports PDF documents containing scanned image content from 68 source languages into 87 target languages. Support for additional source and target languages will be added in due course.

Now it’s easier for customers to send all PDF documents to Document translation directly and let it decide when and how to use the OCR engine efficiently.

For customers already using Document translation, no code change is required to be able to use this new feature. PDF documents with scanned content can be submitted for translation like any other supported document formats.

We are also pleased to announce that the Document translation adds support for scanned PDF document content with no additional charges to customers. Two pricing plans are available for Document translation through Azure — the Pay-as-you-go plan and the D3 volume discount plan for higher volumes of document translation. Pricing details can be found at aka.ms/TranslatorPricing.

Learn how to get started with Document translation at aka.ms/DocumentTranslationDocs.
Send your feedback to mtfb@microsoft.com.

Adblock test (Why?)