With Google Translate, converting any sentence to over 100 languages is a snap, but those who use it regularly know there's room for improvement.
In theory, large language models (LLMs) like ChatGPT should usher in the next era of language translation. They consume vast volumes of text-based training data, plus real-time feedback from millions of users around the world, and quickly learn how to "speak" a wide range of languages with coherent, human-like sentences.
But we've heard the "ChatGPT is going to replace everything" refrain before, only to find it's often inaccurate—the worst-case scenario for translation. So we put it to the test, and asked fluent speakers of eight non-English languages to rank the translation results from multiple AI services in a blind test.
First, we compared ChatGPT (the free version) to Google Translate, as well as competing chatbots Microsoft Copilot and Google Gemini. Then, we took a closer look at just ChatGPT, comparing the free and paid versions and the customized AI agents in OpenAI's new GPTStore.
Keep in mind, this is by no means a comprehensive study. "Please consider that small blind tests are insufficient; more rigorous testing is needed to properly evaluate and compare these tools with statistical significance," says Federico Pascual, an AI industry veteran. Still, the results are surprisingly consistent, providing a fascinating glimpse into how AI models work.
Test 1: ChatGPT vs. Google vs. Microsoft
(Credit: Wara1982 / Getty Images)
This first test occurred back in June 2023, making PCMag one of the first to test these supposedly all-knowing, new chatbots for language translation.
We asked bilingual speakers of seven languages to blind rank the translation of two paragraphs by Google Translate, ChatGPT, Gemini (then known as Bard), and Copilot (then Microsoft Bing Chat). Once they completed the exercise, we revealed which service produced each one.
-
Languages Tested: Polish, French, Korean, Spanish, Arabic, Tagalog, Amharic
-
Translation Services: Google Translate, Google Bard, ChatGPT, Microsoft Bing
-
Test Paragraph 1: "Hello! Do you speak English? I need some help with directions. I am trying to find a vegetarian restaurant because my sister does not eat meat. What do you recommend? We also want to stay within a few miles of here, and don’t want to spend more than $50. If they have cocktails, that would be a bonus. We’ve had a long day of traveling and need to blow off some steam! You’re welcome to join us. Cheers!"
-
Test Paragraph 2 - "How do I buy tickets to the boat party? Do we need to pay in advance, or can we buy them at the dock when we arrive? I need to be on the upper deck because sometimes I get seasick when I’m too close to the water. Also, I want to be as far away as possible from the young hooligans who want to pop champagne constantly during the voyage. That’s dangerous and not my kind of fun!"
Result 1: AI Chatbots Beat Google Translate
The results were shockingly consistent. Of the 12 examples we sent to our participants, they all preferred the AI chatbots—ChatGPT, Google Bard, or Microsoft Bing—to Google Translate. ChatGPT topped them all, expertly converting colloquialisms in the examples like "blow off steam," whereas Google Translate tended to provide more literal translations that fell flat across cultures.
The table below contains our participants' ranking for each service. Those who received both paragraph examples are marked with a (1) and (2). The others only received the first. Some languages do not have a fourth rank because Google Bard refused the translation task and recommended using Google Translate, likely an effort by Google to not cannibalize its own products.
"In my opinion, [ChatGPT] is the closest to a normal conversation," says Ana Romero, who ranked the Spanish translations. "The level of formality between the two key questions are consistent (informal) and the right translation of ‘to blow off steam’ is used."
Romero also appreciated that ChatGPT's translation gives the option to end certain words in the masculine or feminine, rather than selecting one for you. For example, it wrote: eres bienvenido/a a unirte a nosotros—"you are welcome to join us"—which would vary based on the gender of the speaker's invitee.
A consistent pitfall for Google Translate was its literal interpretations. For example, in French Google Translate kept the word "hooligans" in English, while the chatbots knew to go with the culturally appropriate slang voyous.
"The secret sauce of chatbots like ChatGPT is RLHF, which is reinforcement learning with human feedback," says Nazneen Rajani, research lead at Hugging Face, maker of AI-based Hugging Chat. "[They] collect human preferences on model responses for dimensions such as truthfulness, harmlessness, helpfulness, etc. The human preferences help with selecting the ones that are more culturally appropriate, especially for non-native speakers."
However, none of the AI chatbots were a one-to-one replacement for a fluent speaker. All the chatbots still suffered from awkward and inaccurate word choice at times; they just had fewer instances of it. For example, in Polish, Microsoft Bing translated "You're welcome to join us [at the restaurant]," to "Zapraszamy CiÄ do nas," which is actually an invitation to "come to my house," says Barbara Pavone, PCMag's senior manager of content distribution.
Google Translate Wins For Niche Languages
Traditional Ethiopian bowls (Credit: Evgenii Zotov / Getty Images)
Google Translate beat out ChatGPT for the less common languages we tested: Tagalog (Philippines) and Amharic (Ethiopia). They have the smallest estimated global population of speakers: Tagalog has 33 million global speakers who claim it as their mother tongue, and Amharic has 25 million, according to WorldData.info. (Spanish has 450 million for Spanish and Korean is at 80 million.)
Colin Salao, who ranked the Tagalog translations, noted that ChatGPT used words that are "super formal," and reserved for public announcements. He found Bing to be "the most literal translation," and ranked it lower compared to ChatGPT and Google Translate.
"[AI models] wouldn't generalize well for languages with low resources or for which enough human preferences were not collected," Rajani says. For Amharic and Tagalog, we suspect the chatbots lacked enough data to make a nuanced response that fit the context of the paragraph. Instead, they appeared more literal than Google Translate, the opposite of what we saw for the other languages.
Microsoft Bing struggled even more for Amharic. It left a portion of each paragraph in English. This was the only time any of the services failed to attempt a translation, including for other script-based languages like Korean and Arabic:
-
Paragraph 1 - á°áá! á„ááŽá” á„áá°áááá© á„áá°áá á„ááČá
á„áá á ášá? á áá á ášá°áášáĄ ááȘá á€á” ášá°á áá± áááŠáœ áá
ááá á”á áá ášá
ááááá? á„á á $50 á„á» áá áá á„á áš cocktails áá áá? áš 2-3 á. We’ve had a long day of traveling and need to blow off some steam! You’re welcome to join us. Cheers!
-
Paragraph 2 - á„ááŽá” ááááá« áá ááá? á„ááŽá” á„áá°áášá°á áá áá á„á áš dock á”á áá ááá? á áá á áš upper deck áá°ááłá á„á á áá« á°ášááȘ áš champagne áá á„ áš young hooligans áš á°ášááȘ á áá áá áá? That’s dangerous and not my kind of fun!
Test 2: Is ChatGPT Plus Worth It for Translation?
(Credit: fotoVoyager / Getty Images)
Knowing AI chatbots generally beat out Google Translate for translation, a new question emerges: Which version of ChatGPT is best?
Recommended by Our Editors
How to Detect Text Written by ChatGPT and Other AI Tools
No Photoshop? How to Generate AI Images in Microsoft Paint on Windows 11
Want a Productivity Boost? How to Use Copilot Pro AI With Microsoft 365 Apps
OpenAI offers a free plan, which runs on a model called GPT-3.5, as well as a paid Plus plan for $20 per month. With a Plus account, you can use ChatGPT's more advanced model, GPT-4, as well as access a new offering called GPTs. These customized AIs are trained in specific tasks, like translating a language (or even being a romantic partner).
In February 2024—eight months after the initial test—our trusty translators did another blind test, this time comparing ChatGPT's various versions to one another. We also snuck in Google Translate's results to see if it still ranked lower, given how fast these technologies evolve.
-
Languages: Polish, French, Korean, German, Arabic, Tagalog (Note: German was not tested in the first round, and we did not include Spanish or Amharic in this second test due to availability issues.)
-
Translation Services: Google Translate, free ChatGPT (GPT-3.5), paid ChatGPT (GPT-4), paid ChatGPT (GPTStore trained to be a translator for each specific language)
-
Test Paragraph: From Harry Potter - "Harry felt as though he had barely lain down to sleep in Ron’s room when he was being shaken awake by Mrs. Weasley. “Time to go, Harry, dear,” she whispered, moving away to wake Ron. Harry felt around for his glasses, put them on, and sat up. It was still dark outside. Ron muttered indistinctly as his mother roused him. At the foot of Harry’s mattress he saw two large, disheveled shapes emerging from tangles of blankets. “’S’ time already?” said Fred groggily."
ChatGPT offered a GPT for each language we tested. The chat interface looks nearly identical to the main ChatGPT page, with the exception of a few prompts to get started, such as "translate to German" or "convert PDF to Italian." This suggests more advanced language capabilities.
ChatGPT Plus offers GPTs customized for more advanced translation tasks. (Credit: OpenAI)
Result 2: Paid ChatGPT Wins, But Google Translate Surprises Us
(Credit: OpenAI Blog)
Nearly every time, ChatGPT Plus offered the best translation. Our testers ranked either GPT-4, the more advanced model only accessible with a Plus account, or a language-specific GPT as number one for five out of six languages. Though still not a definitive answer due to the small sample size, the consistency of the results seems to suggest that more advanced, highly trained models do make a difference.
"[The GPT for Tagalog] is easily the best for me this time," says Salao. "Most of the grammar was correct, and the main thought of each sentence was properly translated. There were a few parts that could be considered mistakes — like using 'gusot' as the translation for both 'tangles' and 'disheveled,' but those are minor."
Google Translate did surprisingly well, however—better than the first test eight months ago. It ranked first in German, and second in Tagalog and Arabic.
"If I compare this to the nonsense that Google Translate used to come up with in the past, this is night and day," says our German tester, Sandra. "I'm super impressed."
The free version of ChatGPT, GPT-3.5, also ranked surprisingly low. It ranked highest for German, coming in at second, but for all others it was either third or dead last. It's unclear if OpenAI somehow intentionally limits the free version's capabilities to push users to pay for a Plus account, but all things considered the free version of ChatGPT and Google Translate performed roughly equivalent in this latest test.
Learn more about the technology behind ChatGPT and other LLMs in our explainer.
Adblock test (Why?)