“Hallucinations” aside, today’s sophisticated chatbots can sometimes seem like magic — passing standardized tests with flying colors, or conjuring up multilingual poetry in the blink of an eye.
Well… depending on what language you speak. A recent paper awaiting peer review from a group of researchers at Amazon and the University of California, Santa Barbara found that chatbots’ linguistic skills might be threatened by ghosts from a past era of AI, raising significant questions about their ability to communicate effectively in lesser-used languages on the web (think regional dialects from Africa or Oceania). Analyzing a database of billions of sentences, they found that a huge chunk of the digital text likely to be hoovered into LLMs from those languages wasn’t written by native speakers but instead was crudely machine-translated by older AIs.
That means today’s cutting-edge multilingual models are training on very low-quality data, leading to lower-quality output in some languages, more hallucination, and potentially amplifying the web’s already-existing shortcomings and biases.
That’s obviously bad in its own right, but it raises a larger question about the future of generative AI: Is it doomed, as some have predicted, by the “garbage in, garbage out” principle?
I spoke today with Ethan Mollick, an AI researcher and professor at the University of Pennsylvania’s Wharton School, and asked him what he thought about the findings given his work on how people actually interact with AI models in a professional or classroom setting. He was skeptical that messy, photocopy-of-a-photocopy results like those the Amazon and UC Santa Barbara researchers found could lead to the “model collapse” that some researchers fear, but said he could see a need for AI companies to tackle language issues head-on.
“There are worlds where this is a big problem, and data quality and data quantity both matter,” Mollick said. “The real question is whether there’s going to be a deliberate effort, like I think Google has done with Bard, to try and train these models for other languages.”
Usually, large language models are trained with extra weight given to heavily-edited, high-quality sources like Wikipedia or officially published books and news media. In the case of lesser-used languages there’s simply less native, high-quality content in that vein on the web. The researchers found that AI models then disproportionately train on machine-translated articles they describe as “low quality,” about “topics like being taken more seriously at work, being careful about your choices, six tips for new boat owners, deciding to be happy, etc.”.
All it takes to determine what the “garbage in” to an AI model might be, then, is a quick web search. The “garbage out” is, of course, apparent from one’s interaction with the model, but exactly how it got made is less clear — and researchers like Mollick say the very size and sophistication of current AI models means that remains opaque to researchers for the moment.
“Even with open-source models, we just fundamentally don’t know” how, or why, certain AI models operate better or worse in any given language, Mollick said. “There are dueling papers about how much the quality versus quantity of data matters and how you train better in foreign languages.”
So, for those keeping score: Old, low-quality machine-translated foreign-language content does predominate in more obscure languages, reducing AI models’ fluency with them. But we don’t know exactly how this happens within any given AI model, and we also still don’t know exactly the extent to which AI development is threatened by training on AI-generated content.
Mehak Dhaliwal, a former AWS intern and current PhD student at UC Santa Barbara, told Vice’s Motherboard that the team initiated the study because they saw the lack of quality firsthand.
“We actually got interested in this topic because several colleagues who work in MT [machine translation] and are native speakers of low-resource languages noted that much of the internet in their native language appeared to be MT generated,” he said.
So what can actually be done about it? Brian Thompson, a senior scientist at Amazon AWS AI who is one of the paper’s authors and its listed contact, told DFD via email that he couldn’t comment. But he pointed to the conclusion of his fellow researchers that model trainers could use tools to identify and eliminate machine-translated content before it gums the model’s works up.
Both researchers and the data analysts fine-tuning these models are able to flag and classify data at an almost psychedelically minute level, meaning it should be no problem to at least attempt a prophylactic against bad translated content. Still, with the most sophisticated AI models like GPT-4 rumored to have roughly 1.8 trillion parameters, those scientists could have their work cut out for them.
franco files
Germany and France are pushing the European Union’s AI Act negotiations to their limit.
POLITICO’s Gian Volpicelli, Océane Herrero, and Hans von der Burchard reported on the back-and-forth for Pro subscribers today, as the two bloc heavyweights are pushing for more business-friendly strictures in the law ahead of a vote on its final text scheduled for Feb. 2.
“There is no final text,” a representative for the cabinet of French Economy Minister Bruno Le Maire told POLITICO Wednesday. “The regulation is still being negotiated, and that will result in another round of three-way negotiations.”
A particular sticking point is copyright, where France says the AI Act’s planned requirement that companies disclose the copyrighted material used in training AI would be an impediment to AI startups. Still, one diplomat familiar with the negotiations told POLITICO the law’s text is unlikely to be changed or blocked by force, saying France, Germany, and Italy are “isolated” in their preference for less stringent regulation.
good chatbot, good
Take a deep breath: The RAND Corporation says we don’t have to worry just yet about chatbots unleashing biological weapons.
In a report published this morning, RAND researchers say that “biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools” and that “LLMs do not substantially increase the risks associated with biological weapon attack planning.”
They ran a traditional controlled experiment, where one group of security experts planned an imaginary biological attack with LLMs and one planned an attack without them. The chatbots were of negligible help to the researchers.
Still, they say there might be room for, uh, “improvement” on that front: “It remains uncertain whether these risks lie ‘just beyond’ the frontier and, thus, whether upcoming LLM iterations will push the capability frontier far enough to encompass tasks as complex as biological weapon attack planning, or whether the task of planning a biological weapon attack is so complex and multifaceted as to always remain outside the frontier of LLMs,” they write in the conclusion.
The RAND researchers recommend scaling up various “red-teaming” exercises meant to detect malevolent AI activity before its use in the wild, and bolstering the research community around negative AI capabilities.
Tweet of the Day
THE FUTURE IN 5 LINKS
Stay in touch with the whole team: Ben Schreckinger ([email protected]); Derek Robertson ([email protected]); Mohar Chatterjee ([email protected]); Steve Heuser ([email protected]); Nate Robson ([email protected]); Daniella Cheslow ([email protected]); and Christine Mui ([email protected]).
If you’ve had this newsletter forwarded to you, you can sign up and read our mission statement at the links provided.
No comments:
Post a Comment