DeepMind benchmarks

Google's Gemini 3 Deep Think Achieves Record Results on Scientific Benchmarks

Feb 13, 2026

2 min read

Google's Gemini 3 Deep Think Achieves Record Results on Scientific Benchmarks — Unsplash - googledeepmind

On 12 February 2026, Google announced a major update to Gemini 3 Deep Think, a specialised reasoning mode designed to tackle complex scientific, research and engineering challenges. The model was developed in close partnership with scientists and researchers, and has set new records on several leading benchmarks, outperforming OpenAI's GPT-5.2 and Anthropic's Claude Opus 4.6.

The updated Deep Think scored 48.4% on Humanity's Last Exam without tools and achieved 84.6% on ARC-AGI-2, a result verified by the ARC Prize Foundation. The ARC-AGI-2 score is particularly notable, as humans average approximately 60% on these tasks, while previous AI models often struggled to surpass 20%. On the Codeforces competitive programming platform, the model attained an Elo rating of 3,455, placing it in the Legendary Grandmaster tier. In the natural sciences, it demonstrated gold medal-level performance on the written sections of the 2025 International Physics, Chemistry and Mathematics Olympiads, and scored 50.5% on the CMT-Benchmark for advanced theoretical physics. Among practical applications, Google highlighted the model's ability to convert hand-drawn sketches into 3D-printable files.

Gemini 3 Deep Think is now available to Google AI Ultra subscribers in the Gemini app, and for the first time, Google is also making the model accessible via the Gemini API through an early access programme. The results demonstrate that the model represents a significant advancement in AI-driven scientific reasoning, not only on abstract benchmarks but also in practical engineering applications.

Sources: