Friday, October 27, 2023

Microsoft Says the ComSL Model Outperforms Other Models in Speech Translation - Slator - Translation

On October 14, 2023, researchers at Microsoft Cloud and AI, Microsoft Research Asia, and Shanghai Jiao Tong University published updated results for the capabilities of ComSL (Composite Speech-Language Model), a speech-language model originally introduced in a paper in May 2023.

According to the researchers, the ComSL model is based on public pretrained speech-only (audio data) and language-only (text data) models and has been optimized for spoken language tasks by integrating both modalities into its training.

The main differentiator of the ComSL model, explained the researchers, is that it outperforms the results achieved through “end-to-end modeling,” the most widely used training methodology thus far. End-to-end modeling uses audio and text data separately even if, the researchers say, they “may not be optimal for each other.”

In the composite model, the researchers obtained a simpler cross-modality learning that uses speech-text mapping/matching. The training allows the model to perform better and does not require any force-aligned speech and text.

For their methodology, the researchers applied machine translation (MT) and automated speech recognition (ASR) as what they call “auxiliary tasks” in a multi-task learning mode during the optimization of the end-to-end speech translation (ST) model.

10 LLM Use Cases (Main Title)

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

Multi-task learning (MTL) mode implies “sharing common knowledge among different tasks” so that the MT task can guide the ST task. However, the researchers stated that, because of the mismatch between speech and text modalities, the guidance was not as effective.

The ComSL model was trained with existing, fine-tuned models, including speech-only input and text-only input, as well as with ST, ASR, and MT as tasks and a “cross-modality learning (CML)” approach based on paired speech-text input instead of forced-alignment. 

The training steps consisted of fine-tuning the language model (with all the paired text data), multi-task learning (the tasks were ST, MT, ASR, and CML), regularization on the MT output (fine-tuning with MT tasks), and freezing speech encoder (retaining speech representations at the start of fine-tuning).

400 hours of English

The experiments in this study involved the CoVoST 2 dataset, which comprises translations from 21 languages into English and from English into 15 languages, and approximately 400 hours of English recordings and 900 hours of recordings from 21 additional languages. 

The researchers focused mainly on the non-English language into English speech translation, measuring performance with BLEU scores and the CoVoST 2 testing set. The models utilized as the baseline were Whisper and mBART-50, themselves fine-tuned with CoVoST 2.

The composite model was found to outperform the base speech model (Whisper) and the combination of speech and language models (Whisper+mBART). The incorporation of ST data contributed to a high score on the CoVoST2 testing set, and the composite model was also evaluated on speech-to-text translation tasks with better results than those known for the end-to-end modeling that includes the same tasks of ST, ASR, and MT.

Adblock test (Why?)

How Effective Are Large Language Models in Low-Resource Language Translation - Slator - Translation

Large language models (LLMs), such as ChatGPT, have shown remarkable capabilities in performing a range of language tasks, including machine translation (MT). But how effective are they when it comes to low-resource languages (LRLs)?

A research paper published on September 14, 2023, delves into the translation prowess of ChatGPT and other LLMs across a diverse set of 204 languages, encompassing both high- and low-resource languages. According to the authors, this is “the first experimental evidence for an expansive set of 204 languages.”

Nathaniel R. Robinson, Perez Ogayo, David R. Mortensen, and Graham Neubig from Carnegie Mellon University underscored the need for such an investigation, noting that there exists a wide variety of languages for which recent LLM MT performance has never been evaluated. As a result, it is difficult for speakers of the world’s diverse languages to know how and whether they can use LLMs for their linguistic needs.

In addition, the authors emphasized that “the majority of LRLs are largely neglected in language technologies” in general with current MT systems either performing poorly on them or not including them at all. “Some commercial systems like Google Translate support a number of LRLs, but many systems do not support any,” they said.

The authors pointed out that their work differs from existing studies since the focus here is on end users. The inclusion of a remarkable 204 languages, which incorporates 168 LRLs, underscores the commitment to addressing the diverse needs of LRL communities, which are frequently overlooked in the discourse on language technology. “We include more languages than any existing work […] to address the needs of various LRL communities,” they explained.

10 LLM Use Cases (Main Title)

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

To conduct their research, the team used data from FLORES-200 (an evaluation benchmark) and queried the OpenAI API to translate their test set from English into the target languages. 

They evaluated ChatGPT’s MT performance across the entire language set and compared it with NLLB-MOE as their baseline, as it is the current state-of-the-art open-source MT model with wide language coverage. Comparative evaluations were also carried out against results from subsets of selected languages using Google Translate and GPT-4.

In their exploration of MT prompts, they employed both zero- and five-shot approaches for ChatGPT MT. The evaluation metrics, spBLEU and chrF2++, provided a robust basis for assessing the outputs.

The results suggest that while ChatGPT models approach or even surpass the performance of traditional MT models for some high-resource languages, they consistently lag for LRLs. Notably, African languages emerge as a particular challenge, with ChatGPT underperforming traditional MT in a substantial 84.1% of the languages studied.

Language Resources and Costs

The researchers also examined language features, including language resources, language family, and script, to assess the effectiveness of LLMs. 

This analysis aimed to uncover trends that could guide end users in selecting the most appropriate MT system for their specific language. “Analyzing this may reveal trends helpful to end users deciding which MT system to use, especially if their language is not represented here but shares some of the features we consider,” they said.

According to the authors, a language’s resource level is the most important feature in predicting ChatGPT’s MT effectiveness, while script is the least important.

The authors stressed financial aspects as well, particularly as it pertains to LLM users. “We evaluate monetary costs, since they are a concern for LLM users,” the authors said. Few-shot prompts, despite their potential for modest improvements in translation quality, come at a higher cost due to charges for both input and output tokens.

The authors emphasized that they want to help end users of various language communities know how and when to use LLM MT. “We expect that our contributions may benefit both direct end users, such as LRL speakers in need of translation, and indirect users, such as researchers of LRL translation considering ChatGPT to enhance specialized MT systems,” they concluded.

Adblock test (Why?)

Thursday, October 26, 2023

Interpreter says Puska asked for confession translation - RTE.ie - Translation

An interpreter has told the Central Criminal Court that the man accused of the murder of schoolteacher Ashling Murphy asked him to translate his confession to gardaí, two days after she was killed.

Miroslav Sedlacek was giving evidence at the trial of 33-year-old Jozef Puska who has pleaded not guilty to the murder of Ms Murphy in January 2022.

Miroslav Sedlacek is originally from the Czech Republic and provides translation services in German, Czech and Slovak.

On 14 January 2022, he provided translation services in Slovak twice to gardaí in St James' Hospital in Dublin, on a phone line.

He told the court the second conversation took place at around 6pm on that evening, and lasted around 20 minutes.

He said the conversation began with gardaí telling Jozef Puska about the search warrant they had and explaining that his personal belongings would have to be seized for an investigation into a murder in Tullamore.

He told the court Mr Puska wanted to know how this was related to him and wanted to know if he was a suspect. Mr Sedlacek said gardaí told him he was a person of interest and explained what this meant.

Mr Sedlacek said he remembered very well what followed after this. He said it was at this point that Mr Puska asked him personally to translate his confession.

He said Mr Puska asked him to translate accurately and exactly what he was saying. He said Mr Puska told him to tell the gardaí that he did it, that he killed her and that he did not do it intentionally.

Mr Sedlacek said this was still between him and Mr Puska before he had the chance to translate – it was quite spontaneous he said, everything came quickly.

He said Mr Puska said he did not want to do it, that he was very sorry that he did it and that it happened. Mr Sedlacek said he translated to gardaí word for word and gardaí cautioned Mr Puska. He said he translated the caution and Mr Puska said he understood.

Mr Sedlacek said Mr Puska then started asking some questions.

He said Mr Puska was very concerned about the safety of his family. His first concern was whether or not his family members’ names would go public. Gardaí said his own name would go public.

He also asked if there was any possibility the girl’s family would like to take any revenge on his own family for what he had done to her. He said gardaí explained Ms Murphy’s family would certainly not take revenge on his family.

Mr Sedlacek said Mr Puska’s voice was very different from the first conversation he had with him earlier on the same day. He said he was quite emotional and his voice was trembling, adding his sentences were quite disjointed. He said he supposed this was as a result of the situation he was in.

He said Mr Puska wanted to stress that he did not do anything intentionally.

He said the garda then told him that Mr Puska was not feeling well and they would have to end the call.

He said Mr Puska asked what would happen next and the garda explained that when he recovered he would be brought to Tullamore garda station and would be interviewed there.

Mr Sedlacek said he would describe Mr Puska as being in very low spirits after the confession. "I would even say desperate," he told the court.

Earlier, the site nurse manager at St James’ Hospital, Roz Gillen, told the court she had been approached by Detective Sergeant Pamela Nugent on the evening of 14 January. The garda had a copy of a search warrant and Ms Gillen decided to move Mr Puska to a single room.

Under cross examination from defence counsel Michael Bowman, Ms Gillen said there was never any request by gardaí to speak to a treating doctor. She was not asked to refer to his medical notes and had no understanding of Mr Puska’s state of mind or medical circumstances.

She agreed she had no function in determining the fitness of someone to deal with gardaí.

Asked if a request to deal with a treating doctor could have been accommodated, she said she did not know if a doctor would have been there as it was a Friday evening.

Adblock test (Why?)

The rise and inevitable fall of Joy Pocket Dictionary - The Business Standard - Dictionary

The red cover caught my attention on a recent afternoon stroll down a footpath at Old Paltan. It was a small Joy Pocket Dictionary that stood out from the hundreds of books by celebrated and amateur writers. 

I instantly recognised it. In my school days, I always carried the dictionary with me. I studied new words and found my favourite ones. Truth be told, I often forgot what I learned, so I would then try to memorise the word's meanings again. This kept on for a few years.       

There was a time when the Joy Pocket Dictionary was ubiquitous across the country. School and college-going students who wanted to improve their English language skills would always carry it. 

The Joy Dictionary had to compete with its rival Indian AT Dev's pocket-sized dictionary and lived through the tough competition of the dictionary business in the 1990s and 2000s. 

However, it was the emergence of websites, and later mobile phone applications, which ultimately proved to be the final nail in Joy Pocket Dictionary's coffin. But before it met its demise and became a thing of nostalgia, the dictionary saw outstanding business.

The highs and lows  

At the height of its popularity between 1990 to 2000, the sale of a single category of dictionary reached 10,000 copies per month. Sometimes, special discount periods like Pahela Baishakh saw higher sales. 

"We would jointly make efforts to scale up the business," said Shahid Hasan Tarafder, the owner of publishing company Gyankosh Prokashoni, adding, "The binding and the cover were also attractive." 

At the time, there were around 10 product lines including pocket dictionaries, learner's dictionaries, advanced learner's dictionaries, and Joy concise dictionaries. The company used to publish English language learning books like Six-in-One and Three-in-One. There were some religious books too.

In 1988, Shahid became the sole agent for Joy Dictionaries in Dhaka city.

He bought the copyrights in 2006. By then, the internet had already reached city homes, businesses, offices and cyber cafes in district towns, but people were not quite accustomed to it. Also, there was the factor of regular accessibility to the internet. As a result, the Joy Dictionary continued to enjoy massive popularity. 

Shahid said that Joy's pocket dictionaries, as well as the medium-sized Joy Advanced Pocket Dictionary, were sold at the same pace. The other Joy dictionaries include different versions — English-to-Bangla, Bangla-to-English and Bangla-to-Bangla. The most popular of them is still the English-to-Bangla dictionary. 

Gyankosh Prokashoni saw a boom in Joy Dictionary sales for approximately 10 years. 

However, a gradual decrease in sales started to emerge. During 2015-16 and due to the emergence of mobile phone apps, sales started to take a nosedive. Fast forward to 2023, Shahid said that the number of sales of a single dictionary has now come down to 500 copies per year. 

In 2006, the price of Joy Pocket Dictionary was more or less Tk20. Now the wholesale price is Tk40. 

Every year, Shahid's publishing company publishes around 10,000 copies of Joy dictionaries to run the whole year. The first edition of the dictionary came out in 1985. Another edition came out in 1990. But the dictionary was reprinted in 2023. 

Additionally, Shahid said more than 100 words have been added to the dictionary in the last decade by the editors.  

The rise of Joy Dictionary 

SK Ahmed was the original publisher of the Joy Pocket Dictionary. In the mid-1980s, publishing company Joy Books International started to publish the dictionary. 

"He [SK Ahmed] produced the dictionary and I would distribute the dictionaries across the country," said Shahid, now a 67-year-old man.  

"In the 2000s, at one point, SK Ahmed lost interest in the book business. He proposed that I buy out his company," recalls Shahid. "For Tk50 lakh in 2006."    

"He is one of my distant relatives, and I knew the ins and outs of the market of the Joy Dictionary," Shahid added.

He had another reason to buy out Joy Dictionary. Gyankosh, Shahid's stationary shop which started in 1980 mainly with academic textbooks, became popular with customers because of this dictionary.   

SK Ahmed Publishing Company was the first local private book publishing company to publish pocket-size and medium-size dictionaries in Bangladesh, Shahid said. 

"[And] the quality of the dictionary was always good," said Shahid. He said that many Bangladeshi publishers later took the initiative and published dictionaries but failed to replicate Joy Dictionary's success. 

Shahid also recounted how SK Ahmed had a printing press in the New Market area. "This man had the capacity to do something innovative. The Indian imported dictionaries received a blow because of the Joy Pocket Dictionary for its quality," said Shahid.  

At that time, some Indian pocket dictionaries of AT Dev reached the market in Bangladesh. But Joy Dictionary put a stop to those imports. 

However, the book-size Samsad Dictionary continued to reach Bangladesh, and to date, some still do. 

Joy turns to despair 

The making of a pocket-size dictionary is difficult. The bookbinders who once used to bind pocket-size dictionaries now show no interest. Shahid explained that the price of the paper has also contributed to the near-demise of the small-size dictionary. The profit margin has fallen significantly. 

In 2018, the price of one rim of double-demy paper stood at Tk800 to Tk900. Now one rim of the double-demy paper is Tk1,700 to Tk1,800. The wages of the bookbinders have also gone up. 

"The profits are not even half of what we used to make in the past," said Shahid.

But it is the weight of the disappearing interest in hard copy dictionaries that decided Joy Dictionary's demise. "In the past people would buy a dictionary with enthusiasm. That enthusiasm has gone away," explained Shahid.    

Currently, they sell Joy Pocket Dictionary, (English to Bengali), Joy Pocket Dictionary (Bengali to English), Joy Nobo Obidhan, Joy Advanced Pocket Dictionary (Bengali to English), Joy Standard Pocket Dictionary, Joy Shabdha Shanchayeta. 

A new dictionary on the cards?

Gyankosh Prokashoni has taken the initiative to publish a book-size Joy Advanced Learners Dictionary this year. 

"Many publishers have a book-size dictionary. As businessmen, we have to always keep up with the competition," said Shahid, adding, "We have made some progress."    

Asked about a plan to make a mobile application for Joy Dictionary, he said, "My son Abdul Wasif has gotten involved in the business and he will decide on the matter. He has some plans for something like that." 

Gyankosh Prokashoni has not changed the logo or colour of Joy Dictionary till now, having only added their names as the publisher. They even kept the name of the former publisher. The dictionary was edited by SK Ahmed in collaboration with experienced professors and headmasters. 

"I kept the name because it is a matter of courtesy and honour," said Shahid. 

Adblock test (Why?)

Author Sarah Ogilvie uncovers the Oxford English Dictionary's 'unsung heroes' after a surprise finding - ABC News - Dictionary

Adblock test (Why?)

Wednesday, October 25, 2023

The Great Gatsby: Fitzgerald's Classic, Impressively Staged, Once Again Defies Translation - New York Stage Review - Translation

Jeremy Eva Samantha in Great Gatsby
Jeremy Jordan, Eva Noblezada, and Samantha Pauly in The Great Gatsby. Photo: Jeremy Daniel

F. Scott Fitzgerald’s The Great Gatsby has been an eternal source of fascination ever since it was published in 1925. It has been evoked numerous times on stage and screen. The latest incarnation is the musical, currently at the Paper Mill Playhouse, which is rumored to be Broadway-bound. It certainly has the look of a big Broadway-caliber production with magnificent sets, effects, costumes, choreography, and a bravura cast of major Broadway talents.

But as impressive as the production looks and sounds, it falls short of successfully translating Fitzgerald’s deeply dark themes concerning America’s obsession with wealth and class onto the stage. With its downbeat message of misplaced dreams and moral corruption, the novel seems to defy adaptation. This is a complex tale with no clear path to a cathartic or redemptive ending, and I have yet to see a winning version of Fitzgerald’s classic—aside from off-Broadway’s Elevator Repair Service production of Gatz, which spent six-plus hours featuring actors reading the entire novel cover to cover. The writing is so richly nuanced, it seems impossible to do it justice any other way.

The Paper Mill is putting on a game effort, principally with lead performers Jeremy Jordan and Eva Noblezada, who are simply sensational as Jay Gatsby and Daisy Buchanan. The two characters depict the hollow extravagances of the 1920s at the center of the novel and they are hopelessly flawed. Gatsby is a failed dreamer. The beautiful Daisy, as written, is superficial and materialistic.

As the story unfolds, the year is 1922. Gatsby has amassed a fortune in the hopes of winning over Daisy, with whom he had fallen in love five years earlier when he was a lowly army officer. World War I separated them and when it was over, Gatsby discovered Daisy has married and moved on.

The book is narrated by Daisy’s earnest cousin Nick Carraway, a young man who happens to move next door to Gatsby’s opulent Long Island mansion where wild parties are a nightly constant. Gatsby befriends Nick in an effort to reconnect with Daisy, who lives lavishly with her philandering husband, Tom Buchanan, and their young daughter right across the Sound. Gatsby deliberately placed his house in their direct line of view where he constantly sees a green light emitting from their dock. It’s a symbolic device representing the American Dream, which becomes unattainable even for him.

The musical’s book, written by Kait Kerrigan, is problematic. She does away with Nick’s narration and focuses on the Gatsby-Daisy love story. By dispensing with Nick’s narration, the story loses a valuable perspective—essentially the moral conscience that Fitzgerald amplified in the book. Without that, it’s a pretty straightforward love story about careless people who are hard to care about.

Kerrigan attempts to address that by depicting Daisy in a more sympathetically vulnerable light so we’re more invested in their relationship. By show’s end, Daisy does an about-face, reverting to the shallow character described by Nick in the book as “careless,” someone who “smashed up things and creatures and then retreated back into [her] money.” Those final climactic scenes are muddled and unfortunately leave us cold.

The score, composed by Jason Howland with lyricist Nathan Tysen, is engagingly tuneful, featuring an eclectic mix of rousing jazz-age numbers and soulful ballads.

Noblezada, fresh from Hadestown and Miss Saigon, is three-for-three now with her powerful voice on that tiny frame. She’s proven herself a genuine star. Her 11 o’clock number—singing “The best thing a girl can be in this world is a beautiful little fool” (a line taken right out of the book)—offers one of the few emotionally engaging moments in the show.

Another is Jordan’s solo “Past Is Catching Up to Me.” Embodying the enigmatic Gatsby, Jordan plays the charismatic recluse to perfection. His act one finale with Noblezada, “My Green Light,” is beautifully rendered.

Every one of the leads is a standout. As Nick, Noah J. Ricketts is terrific; and paired with the excellent Samantha Pauly as the cynical, marriage-averse golf pro Jordan Baker, their courtship almost steals the show.

Representing the have-nots, Paul Whitty and Sara Chase as the tragic George and Myrtle Wilson are equally strong.

And John Zdrojeski, as the entitled chauvinistic brute Tom Buchanan, is a very convincing villain.

Staged efficiently by Marc Bruni, the two-and-a-half-hour production is fairly tight though it could benefit from some surgical cutting.

The tremendous sets and projections designed by Paul Tate dePoo III are truly impressive, along with Linda Cho’s costumes. An inspired touch is a dance number performed by the ensemble in open trench coats that whirl in cadence. Dominique Kelley’s choreography deserves kudos all around for its originality, merging the 1920s dances with the current century’s stylized movements.

It’s easy to see the appeal of adapting Gatsby onto the stage, especially now as we emerge from a pandemic echoing the Spanish influenza that plagued Fitzgerald’s era. And now get ready for a lot more productions, as the book has just entered the public domain (which means it can forever be produced without the need to pay royalties). There is, in fact, another Gatsby musical bound for Broadway helmed by Hadestown director Rachel Chavkin. The jury’s still out on that one. As for the Paper Mill production, I certainly wouldn’t count it out. Given all the talent involved, there is great promise—though I wouldn’t give it a green light just yet.

The Great Gatsby opened Oct. 22, 2023, at Paper Mill Playhouse and runs through Nov. 12. Tickets and information: papermill.org

Adblock test (Why?)

Vestal Third-Graders Receive New Dictionaries - WICZ - Dictionary

[unable to retrieve full-text content]

Vestal Third-Graders Receive New Dictionaries  WICZ