Listen to this story |
At the OpenAI Spring Update, OpenAI CTO Mira Murati unveiled GPT-4o, a new flagship model that enriches its suite with ‘omni’ capabilities across text, vision, and audio, promising iterative rollouts to enhance both developer and consumer products in the coming weeks.
“They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to,” said former OpenAI computer scientist Andrej Karpathy, who was quick to respond to the release.
“The new voice (and video) mode is the best compute interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change,” said OpenAI chief Sam Altman, who wants to bring ‘Universal Basic Compute’ to everyone in the world.
Further, he said that the original ChatGPT hinted at what was possible with language interfaces; “this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.”
Altman said that talking to a computer has never felt really natural for him. “Now it does,” he said, hopeful about the future where people will be using computers to do more than ever before.
What’s really interesting about GPT-4o is that it will be available to ChatGPT Plus (with some personalisation features) and ChatGPT free users soon. “We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people,” said Altman.
“Thanks to Jensen and the NVIDIA team for bringing us the most advanced GPUs to make this demo possible today,” said Murati during her closing remarks.
Meanwhile, OpenAI president and co-founder Greg Brockman also demonstrated human-computer interaction (and even human-computer-computer), giving users a glimpse of pre-AGI vibes.
RIP Google Translate?
In the demonstration of GPT-4o’s real-time translation capabilities, the model seamlessly translated between English and Italian, exemplifying its sophisticated linguistic adaptability. Many believe that this new feature of OpenAI is likely to replace Google Translate.
“OpenAI just killed Google Translate with their real-time translator (near 0 delay in response),” said Fraser,
Meanwhile, Google is getting ready to make some major announcements tomorrow at Google I/O. “Super excited for my first Google I/O tomorrow and to share what we’ve been working on!,” shared Google DeepMind chief Demis Hassabis, sharing a similar glimpse of its multi-modal AI assistant.
Not just Google, but many were quick to point out the end of many AI startups offering similar solutions and features.
“OpenAI just shot Rabbit in the face,” said AI developer Benjamin De Kraker.
Interestingly, OpenAI also announced the launch of the GPT-4o API, which developers can use to build new products and solutions.
Meanwhile, Hume AI, which released EVI (Empathetic Voice Interface), also felt the pressure, making them launch its API today, alongside other future improvements.
Improves Non-English Language Performance
Interestingly, OpenAI has also expanded its language capabilities, supporting over 50 languages, including Indian languages. GPT-4o has significantly optimised token usage for Indian languages, reducing Gujarati by 4.4x, Telugu by 3.5x, Tamil by 3.3x, and Marathi and Hindi by 2.9x.
GPT-4o can engage in natural, real-time voice conversations and has the ability to converse with ChatGPT via real-time video. It also understands the emotional tone of the speaker and can adjust its tone and modulation accordingly.
Moreover, the latest model can understand and discuss images, allowing users to take a picture of a menu in a foreign language and translate it, learn about the food’s history and significance, and receive recommendations.
One Step Closer to Autonomous Agents
Another interesting update was OpenAI’s announcement of the ChatGPT (GPT-4o) desktop app, which can read your screen in real-time. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT.
When will GPT-4 ‘Omni’ Arrive?
(Source: X)
GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. Developers can now access GPT-4o in the API as a text and vision model.
The company is rolling out GPT-4o to ChatGPT Plus and Team users, with Enterprise users to follow soon. ChatGPT Free users will also have access to advanced tools, including features like GPT-4 level intelligence, web responses, data analysis, and file uploads.
However, ChatGPT Free users will have a message limit, which will increase as usage and demand grow. When the limit is reached, the app will automatically switch to GPT-3.5 to ensure uninterrupted conversations.Last but not least, the company has also introduced a simplified look and feel for ChatGPT, featuring a new home screen, message layout, and more. The new design is designed to be friendlier and more conversational.