Meta releases early versions of Llama 3 multimodal AI model

Meta Platforms released early versions of its latest large language model, Llama 3, with new computer coding capabilities and the ability to process image commands. The models will be integrated into the virtual assistant Meta AI, which the company is pitching as the most sophisticated of its free-to-use peers.
Versions of Llama 3 planned for release in the coming months will also be capable of “multimodality,” meaning they can generate both text and images, as it races to catch up to generative AI market leader OpenAI.
The Llama 2 model is unable to understand basic context, Meta reduces these problems in Llama 3 by using “high-quality data” to allow the model to recognise nuances. The demand for data for generative AI models has become a major source of tension in the development of the technology.

Meta Platforms released early versions of its latest large language model, Llama 3, with new computer coding capabilities and the ability to process image commands. The equipped image generator will update pictures in real-time while users type prompts, as it races to catch up to generative AI market leader OpenAI.
See CEO Mark Zuckerburg’s video explainer.

Aiming at AI model with multimodality

Versions of Llama 3 planned for release in the coming months will also be capable of “multimodality,” meaning they can generate both text and images though for now the model will output only text, Meta chief product officer Chris Cox said in an interview.

The models will be integrated into the virtual assistant Meta AI, which the company is pitching as the most sophisticated of its free-to-use peers. More advanced reasoning, like the ability to craft longer multi-step plans, will follow in subsequent versions.

Also read: Meta debuts an ‘all-rounder’ MTAI chip 3 times faster than previous

The inclusion of images in the training of Llama 3 would enhance an update rolling out this year to the Ray-Ban Meta smart glasses, a partnership with glasses maker Essilor Luxoticca, enabling Meta AI to identify entities seen by the wearer and answer questions about them, said Chris Cox.

Data crisis for training AI models

The Llama 2 model is unable to understand basic context, Meta reduces these problems in Llama 3 by using “high-quality data” to allow the model to recognise nuances. Rival Google has run into similar issues and recently suspended the use of its Gemini AI image-generating tool after it was criticised for inaccurate depictions of historical figures.

Meta CEO Mark Zuckerberg said that the biggest version of Llama 3 is currently being trained with 400 billion parameters and already scoring 85 Massive Multitask Language Understanding, citing metrics used to convey the strength and performance quality of AI models.

Also read: US Rep proposes bill forcing AI companies to disclose training data

The voracious demand for data for generative AI models has become a major source of tension in the development of the technology. Mete did not elaborate on the data sets used, although it supplied Llama 3 with seven times more data than Llama 2 used, and used “synthetic” or AI-created data to enhance areas such as coding and reasoning.

Meta releases early versions of Llama 3 multimodal AI model

Aiming at AI model with multimodality

Data crisis for training AI models

Role and Scope

Signal Map

Deeper Profile Context

Strategic Circle

Leadership Alliance

Strategy Circle Briefing

Leadership Alliance Briefing

Private Contact Channels (Member Locked)

Private Session Rules