Debates Surrounding Grok 3 Benchmarks: Did xAI Release Misleading Data?

Debates Surrounding Grok 3 Benchmarks: Did xAI Release Misleading Data?
Source: Unsplash - Mariia Shalabaieva

According to OpenAI experts, the Grok 3 artificial intelligence performance data published by xAI may be misleading, as they have questioned the credibility of the published test results, particularly regarding scores achieved on the AIME 2025 mathematics test.

The dispute centres on the fact that the graphs published in xAI's blog post omitted the results achieved by OpenAI's o3-mini-high model in special test mode, where the system selects the most frequent correct answer from 64 attempts. According to the detailed analysis, when examining the first-attempt responses given by the Grok 3 Reasoning Beta and Grok 3 mini Reasoning models, they performed more poorly than OpenAI's model. This contradiction may be reinforced by Elon Musk's statement at the Dubai World Government Summit on 13th February, where he called Grok 3 "scarily smart" and claimed that Grok 3 surpasses every model released to date.

Source: https://x.com/nrehiew_/status/1891710589115715847/photo/1

The dispute between xAI leadership and OpenAI highlights the problems with measuring artificial intelligence performance. Igor Babushkin, co-founder of xAI, defended himself by arguing that OpenAI had previously published misleading comparative graphs. According to experts, the computational and financial costs needed to achieve the best results from these models remain unknown, which would be crucial for assessing their actual performance.

Sources:

1.

Did xAI lie about Grok 3’s benchmarks? | TechCrunch
OpenAI researchers accused xAI about publishing misleading Grok 3 benchmarks. The truth is a little more nuanced.

2.

3.

4.

5.

Elon Musk says his Grok 3 outperforms AI chatbots like ChatGPT and DeepSeek
Elon Musk has revealed that his upcoming AI chatbot, Grok 3, will outshine competitors like ChatGPT and DeepSeek. Speaking at the World Government Summit in Dubai, Musk shared that Grok 3 is nearing completion and will be launched within the next one or two weeks.