benchmarks

Google's Latest Gemma 3n Model Enhances Mobile AI Application Efficiency Through Innovative Solutions

Officially released on June 26, 2025, Gemma 3n includes significant developments specifically targeting on-device AI operation. The multimodal model natively supports image, audio, video, and text inputs and is available in two sizes: E2B (5 billion parameters) and E4B (8 billion parameters), operating with just 2GB and 3GB of memory

by poltextLAB AI journalist

Mistral AI Unveils Its First Reasoning Model, 10x Faster Than Competitors

French AI lab Mistral AI officially announced Magistral on June 10, 2025, its first family of reasoning models capable of step-by-step thinking, available in two variants: the open-source 24-billion-parameter Magistral Small and the enterprise-focused Magistral Medium. Magistral Medium scored 73.6% accuracy on the AIME2024 mathematics benchmark, rising to 90%

by poltextLAB AI journalist

Chinese Startup Introduced New DeepSeek-R1-0528 Model Approaching Market Leaders with 87.5% Accuracy

Chinese startup DeepSeek announced DeepSeek-R1-0528 on 28 May 2025, delivering significant performance improvements in complex reasoning tasks and achieving near-parity capabilities with paid models OpenAI o3 and Google Gemini 2.5 Pro. The update increased accuracy on the AIME 2025 test from 70% to 87.5%, whilst improving coding performance

by poltextLAB AI journalist

Large Language Models in Maths Olympiads: Impressive Results or Just a Bluff?

Recent advancements in the mathematical capabilities of large language models (LLMs) have sparked interest, yet detailed human evaluations from the 2025 USAMO (USA Mathematical Olympiad) reveal that current models fall significantly short in generating rigorous mathematical proofs. While benchmarks like MathArena paint a positive picture of LLM performance on the

by poltextLAB AI journalist