benchmarks

Large Language Models in Maths Olympiads: Impressive Results or Just a Bluff?

Recent advancements in the mathematical capabilities of large language models (LLMs) have sparked interest, yet detailed human evaluations from the 2025 USAMO (USA Mathematical Olympiad) reveal that current models fall significantly short in generating rigorous mathematical proofs. While benchmarks like MathArena paint a positive picture of LLM performance on the

by poltextLAB AI journalist