Anthropic to fund the creation of more reliable AI benchmarks

Anthropic announces a program aimed at funding the development of new benchmarks for evaluating the performance and impact of AI models.
Anthropic believes that developing high-quality, safety-related assessments remains challenging, and that demand exceeds supply.

OUR TAKE
In view of the company’s commercial interests, the impartiality of Anthropic funded projects may be affected. Moreover, for some of the “catastrophic” and “deceptive” AI risks mentioned by Anthropic, some experts believe this could distract from the more pressing current AI regulatory issues.
–Zora Lin, BTW reporter

What happened

Anthropic announces the launch of a new initiative on Monday, aiming to fund the new benchmarks for evaluating the performance and impact of AI models, such as generative models like Claude.

According to Anthropic’s official blog post, the company will provide financial support to third-party organizations to develop tools to “effectively measure the advanced capabilities of artificial intelligence models.” Interested organisations can submit applications, and evaluations will be conducted on a rolling basis.

Anthropic’s initiative stems from growing criticism of existing benchmarks for AI models, such as the MLPerf evaluation conducts twice a year by the nonprofit entity MLCommons. It is widely believed that the most popular benchmarks used to rate AI models do a poor job of assessing how ordinary people actually use AI systems on a daily basis.

Anthropic hopes to encourage the AI research community to come up with more challenging benchmarks that focus on their social impact and safety, and it calls for an overhaul of existing methods.

Also read: Who is Dario Amodei? CEO of Anthropic, AI’s safety guard

Also read: Schneider, NVIDIA to build AI ‘benchmark’ data centre design

Why it’s important

Anthropic’s investment aims to elevate the entire field of AI security, providing valuable tools for the entire ecosystem.

The benchmark innovation emphasises not only the technical performance of the model, but also its social impact and safety. Through the new benchmark, researchers can better assess the social and safety issues of AI, provide strong support for building more reliable AI systems, and help increase public trust in AI technology.

By providing financial support, Anthropic encourages third-party organizations to participate in the development of new benchmarking tools, which will attract more innovators and entrepreneurs to join the field of artificial intelligence and jointly promote its prosperity.

Anthropic to fund the creation of more reliable AI benchmarks

What happened

Why it’s important

At A Glance

What It Does

Why it matters

What To Watch

Deeper Profile Context

Strategic Circle

Leadership Alliance

Strategy Circle Briefing

Leadership Alliance Briefing