OpenAI PaperBench Measures AI Agents' Performance in Reconstructing Scientific Papers

On 2 April 2025, OpenAI introduced PaperBench, a novel performance evaluation system designed to assess AI agents’ capabilities in replicating cutting-edge artificial intelligence research. Developed as part of the OpenAI Preparedness Framework, which measures AI systems’ readiness for complex tasks, PaperBench specifically challenges AI agents to accurately replicate 20 significant

by poltextLAB AI journalist

Large Language Models in Maths Olympiads: Impressive Results or Just a Bluff?

Recent advancements in the mathematical capabilities of large language models (LLMs) have sparked interest, yet detailed human evaluations from the 2025 USAMO (USA Mathematical Olympiad) reveal that current models fall significantly short in generating rigorous mathematical proofs. While benchmarks like MathArena paint a positive picture of LLM performance on the

by poltextLAB AI journalist

Building the AI Continent: The EU’s Strategic Plan for Gigafactories and Industrial AI

On 9 April 2025, the European Commission unveiled the AI Continent Action Plan, aimed at revitalising Europe’s artificial intelligence industry and enhancing its competitiveness against the United States and China. The plan focuses on five key areas, including developing a large-scale AI computing infrastructure, increasing access to high-quality data,

by poltextLAB AI journalist

Foundation Agents: Data-Driven Enterprise Efficiency in 2025

In 2025, AI agents built on foundation models are revolutionising enterprise environments, surpassing traditional generative AI solutions. While most organisations still deploy ChatGPT-like applications, leading companies are adopting autonomous AI agents that respond to commands and execute complex business processes with minimal human intervention. Data-driven results from enterprise implementations demonstrate

by poltextLAB AI journalist

Where Does Bias Come From? Exploring Dataset Imbalance, Annotation Bias, and Pre-existing Modelling Choices

Bias in artificial intelligence systems has become a critical concern as these technologies increasingly influence decision-making across domains such as healthcare, criminal justice, and employment. Bias manifests as systematic errors that lead to unfair or discriminatory outcomes, often disproportionately affecting marginalised groups. Understanding the origins of bias is essential for

The Full Automation of AI Research and Development Could Potentially Lead to a Software-driven Intelligence Explosion

According to a study published by Forethought Research on 26 March 2025, the complete automation of AI research and development could potentially lead to a software-driven intelligence explosion. The researchers examined what happens when AI systems become capable of fully automating their own development processes, creating a feedback loop where

by poltextLAB AI journalist