MM1: Apple’s first multimodal AI model

  • Rivaling Google’s Gemini: MM1’s extensive parameter range competes with Google’s initial AI model versions.
  • Innovative In-Context Learning: MM1’s ability to understand and respond to new queries based on current conversational context.

Apple has revealed MM1, a new generation of multimodal models that can seamlessly interpret and interact with both images and text, setting the stage for a more intuitive and responsive Siri and iMessage experience.

MM1: pioneering multimodal AI

Apple has introduced MM1, an innovative suite of multimodal AI models that are adept at processing both visual imagery and textual data. These models boast an impressive parameter count of up to 30 billion, making them a worthy match for the earliest iterations of Google’s Gemini models.

Also read: Anthropic claims its latest AI model outperforms GPT-4

The MM1 models are equipped with the ability to interpret and execute instructions that involve both visual and textual elements. For instance, the AI can calculate the combined cost of two beverages by analysing the pricing information displayed on a menu.

One of the standout features of MM1 is its capacity for in-context learning. This permits the model to grasp and address inquiries based on the contextual information present within the ongoing discourse, without the need for specific retraining or fine-tuning for each novel query or task.

This in-context learning capability could potentially enable the model to generate detailed descriptions of images or to respond to questions about the content of photo-based prompts, even if it hasn’t been previously exposed to similar content.

Also read: Apple to showcase ‘visionOS advancements’ at WWDC 2024

Enhancing user experience

In terms of enhancing the user experience, MM1’s multimodal comprehension skills could be leveraged by Apple to elevate the performance of its voice assistant, Siri. This would allow Siri to provide answers to questions that are grounded in visual data, such as those based on images. Furthermore, MM1 could assist in interpreting the context of images and text messages shared via iMessage, thereby providing users with more pertinent suggestions for replies.


Tilly Lu

Tilly Lu, an intern reporter at BTW media dedicated in Fintech and Blockchain. She is studying Broadcasting and Hosting in Sanming University. Send tips to

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *