WASHINGTON - Bill Waawaate is Indigenous, smart, educated, and the millionaire-founder of a highly successful snowmobile company. He also is a comic book superhero from a First Nation in Canada.
"The aim here is to help Canadians understand Indigenous culture and to erase the stereotypes about First Nations communities," said Joseph Johns, the Montreal-based designed and publisher of the Citizen Canada comic book series.
Johns wanted his feather-caped superhero to speak English, French and Cree, a language spoken by more than 95,000 First Nations people in Canada. He assumed he could rely on Google Translate for help.
But the app, which supports 109 languages, does not offer Cree or any of the other roughly 150 Indigenous languages spoken today in North America.
So Johns started up an online petition urging Google to add Cree to its translation engine. That petition has so far received nearly all the 7,500 signatures he had hoped for.
"For me, it just doesn’t make sense," Johns told VOA. "Google Translate does offer Maori, the Indigenous language of New Zealand, which is spoken by only about 50,000 persons. How can a company with 135,000 people working for it in 40 nations across the globe not find the resources to add Canada's most widely spoken Indigenous language?"
VOA posed the question to Google.
"Indigenous languages are incredibly important to us," Google spokesperson Justin Burr said via email. As it turns out, though, Cree is a "low resource" language, which means there aren’t enough written translations of Cree documents to populate and "train" automated translation systems like Google’s.
Burr said Google is actively working toward adding more low resource languages.
"One of those ways is we lean heavily on our contributor community, which allows native speakers to add valuable feedback, verify translations, et cetera, to languages that we do support, as well as languages we have yet to support," said Burr. "Beyond that, we are working on new machine learning techniques that allow us to support the low resource languages with less training data."
University of Colorado linguist Andrew Cowell specializes in Indigenous-language documentation. He explained to VOA some of the challenges for a machine to translate Indigenous languages.
"Most of the world’s languages aren’t written. They are spoken as household or community languages that are not regularly used in any kind of literate way," said Cowell. "The pattern all over the world is that someone speaks one language at home and then they write in the national language. And so that language isn’t represented online. And even if it is, there won’t be any standardized writing system because people make it up as they go."
Adding a language to Google Translate requires the input of "hundreds of millions of words," according to Cowell. "And it needs to be what's called 'clean data,' which means that you have the same spelling and grammar conventions."
Cree is actually a series of dialects that gradually change across Canada.
"Cree is actually considered to be multiple different languages by linguists -- East Cree, Wood Cree, Swampy Cree, Plains Cree, et cetera," said Cowell. "Even within those languages, there is a good deal of regional variation. So, the 'Cree language' is more complex — and each community of speakers is smaller — than would be suggested by statements that '95,000 people speak Cree.'"
Projects in the works
Google says plans are under way to add Guarani, an Indigenous language spoken in Paraguay, Brazil and Bolivia; plus Inukitut, spoken across the North American Arctic and in Greenland; and Tsalagi, the Cherokee language, which has plenty of translated material.
In the early 1800s, a Cherokee named Sequoyah developed a Tsalagi syllabary, a traditional writing system made up of symbols. In 1828, the tribe began publishing the Cherokee Phoenix newspaper. All historic materials, including religious texts, the Cherokee constitution and laws, use Sequoyah's syllabary, and today, learning materials are still being written in syllabics.
The Cherokee Nation’s Language Program department spent nearly two years working with Google to translate more than 50,000 technology terms into Cherokee and developed a syllabary font that Google already has added to its search engine, as well as Gmail, Chromebooks and Android.
But adding Tsalagi to Google translate will take more time — and money.
"We are just researching the amount of resources and manpower it will require," said Roy Boney, manager of the Cherokee Nation’s Language Program, in an emailed statement to VOA. "Currently, we are consulting with linguists at the University of New Mexico and University of Mexico City and also exploring grant opportunities in order to expand our research base."
In the meantime, Cherokee linguists are studying and documenting Tsalagi grammar and syntax, with the goal of pairing it up with already translated texts.
"This will help us develop the proper Cherokee language data to start training machine translation engines," Boney said.
As for Cree—and many other Indigenous languages—Cowell says speakers will have to wait, adding, "I think there are going to be increasing number of communities starting to write with some kind of standardized orthography, I'm hopeful that additional Indigenous languages will be added to translation engines like Google in the future."
No comments:
Post a Comment