• The hallucination index utilised Galileo’s proprietary evaluation metric, context adherence, to assess output inaccuracies across various input lengths.
  • Closed-source models like Claude 3.5 Sonnet and Gemini 1.5 Flash are leading the index due to their proprietary training data.

OUR TAKE
The AI industry continues to face hallucinations as a significant hurdle for production-ready generative AI products. The hallucination index released by Galileo provides a comprehensive evaluation of generative AI models, focusing on their performance in handling hallucinations. It also provides valuable insights for enterprises to select the suitable model tailored to their specific needs and budget constraints.
-Lia XU, BTW reporter

What happened

Galileo, a leading developer in generative AI, released its latest hallucination index. It evaluates 22 prominent generative AI large language models (LLMs) from major companies like OpenAI, Anthropic, Google, and Meta. This year’s index has expanded to include 11 new models, which reflect the rapid growth in both open-source and closed-source LLMs over the past eight months.

The index revealed that Anthropic’s Claude 3.5 Sonnet emerged as the best overall performing model. In contrast, Google’s performance was particularly noteworthy, with its open-source Gemma-7b model performing poorly, while its closed-source Gemini 1.5 Flash consistently ranked near the top.

The AI industry continues to grapple with hallucinations as a major hurdle to production-ready generative AI products. The hallucination index provides valuable insights for enterprises looking to adopt the right model for their specific needs and budget constraints. These developments illustrate the dynamic landscape of generative AI and the ongoing efforts to address the challenges posed by AI hallucinations.

Also read: BNP Paribas partners with Mistral AI to implement LLMs

Also read: 10 AI-powered apps for self-diagnosing health conditions

Why it’s important

AI hallucinations can lead to the generation of incorrect or misleading information, which undermines the reliability of AI systems. So Galileo’s hallucination index can help evaluate and improve models. Developers can create more trustworthy AI applications that enterprises can rely on for critical tasks.

The evaluation of models based on their performance and cost-effectiveness is essential for enterprises looking to implement generative AI solutions. This balance between cost and performance is vital for organizations operating under budget constraints.

As the AI industry grapples with hallucinations as a significant hurdle to production-ready generative AI products, understanding these challenges is essential for enterprises. The hallucination index serves as a vital resource for understanding the competitive landscape of generative AI models, highlighting the strengths and weaknesses of various models while addressing the ongoing challenges in the field.