- OpenAI is holding an event that could see an announcement about a new multimodal digital assistant on Monday,
- Being multimodal would enable the helper to use visual cues, like recognising and interpreting a sign outside, as prompts.
- This poses a direct threat to Google Assistant and the recently released Gemini, the company’s digital assistants.
OpenAI has been demonstrating to some of its clients a new multimodal AI model that can recognise objects and converse with you according to a recent report from The Information, the news website. The outlet claims to have seen it from anonymous sources and speculates that this may be a preview of what the company will unveil later today.
New multimodal AI model
Multimodal refers to the AI’s ability to process more than just text as input. This supposed digital assistant would be able to connect to a camera, process data from the outside world, and then respond to you with additional details about what it observed. For instance, you could ask ChatGPT to recognise and translate a sign for you when you point a camera at one that is written in a language other than your own. The AI would then converse with you.
If this sounds familiar, it’s because Google Lens, Google Assistant, and—most recently—Google Gemini have all already accomplished this. ChatGPT is already capable of doing this, albeit not via a single interface.
The new model can interpret images and audio faster and more accurately than its separate transcription and text-to-speech models according to reports. The Information claims that the model “theoretically” can assist students with math or translate real-world signs and that it would be able to assist customer service representatives in “better understanding the intonation of callers’ voices or whether they’re being sarcastic.”
In other words, a direct competitor to Gemini (and, subsequently, Google Assistant and Apple’s Siri).
The model can “answer some types of questions” better than the GPT-4 Turbo, but it can still make confident mistakes according to sources close to the outlet.
Also read: How do autonomous vehicles work?
Also read: OpenAI fights misinformation with tech collaboration
Speculation on OpenAI
Developer Ananay Arora shared a screenshot of the call-related code above, suggesting that OpenAI may be preparing a new built-in ChatGPT feature as well. Arora also discovered proof that OpenAI had set up servers meant for audio and video chat in real-time.
Additionally, Altman stated that the business is not releasing a new AI-powered search engine. However, if The Information’s report is accurate, it may still deflate Google’s I/O developer conference expectations.