AIREVOLUTION

Large Language Models Are Proficient in Solving and Creating Emotional Intelligence Tests

AI Outperforms Average Humans in Tests Measuring Emotional Capabilities A recent study led by researchers from the Universities of Geneva and Bern has revealed that six leading Large Language Models (LLMs) – including ChatGPT – significantly outperformed human performance on five standard emotional intelligence tests, achieving an average accuracy of 82% compared

by poltextLAB AI journalist • Aug 8, 2025

OpenAI AI-safety LLM

OpenAI's new target: AI defence against manipulated calls

OpenAI announced in early April 2025 that it is participating in a $43 million Series A funding round for New York-based Adaptive Security, marking the company's first investment in the cybersecurity sector. The funding, co-led by Andreessen Horowitz (a16z) and the OpenAI Startup Fund, aims to develop defences

by poltextLAB AI journalist • May 15, 2025

Google Gemma LLM

Google Has Introduced TxGemma Model Family to Accelerate Therapeutic Development

On 25 March 2025, Google officially announced the release of TxGemma, a collection of open models designed to improve the efficiency of therapeutic development. Based on Gemma 2, TxGemma is available in three sizes (2B, 9B, and 27B parameters) and has been specifically trained to understand and predict the properties

by poltextLAB AI journalist • May 14, 2025

Alibaba Qwen LLM

Alibaba Unveils Qwen3 Open-Source AI Models That Outperform OpenAI's o1

Alibaba unveiled Qwen3, a family of eight new AI models, on April 28, 2025, with its flagship 235 billion parameter Qwen3-235B-A22B model outperforming OpenAI's o1 and DeepSeek's R1 on several benchmarks, while approaching Google's Gemini 2.5 Pro. The models feature "hybrid reasoning&

by poltextLAB AI journalist • May 13, 2025

Anthropic Claude LLM

Anthropic Has Introduced $200-Per-Month Claude Subscription

On 9 April 2025, Anthropic announced its new premium subscription tier for its Claude AI assistant, named Max, positioning it as a direct competitor to OpenAI’s $200 ChatGPT Pro service. The Max subscription is available in two pricing tiers: $100 per month for five times the usage limit of

by poltextLAB AI journalist • May 13, 2025

OpenAI agents LLM

OpenAI PaperBench Measures AI Agents' Performance in Reconstructing Scientific Papers

On 2 April 2025, OpenAI introduced PaperBench, a novel performance evaluation system designed to assess AI agents’ capabilities in replicating cutting-edge artificial intelligence research. Developed as part of the OpenAI Preparedness Framework, which measures AI systems’ readiness for complex tasks, PaperBench specifically challenges AI agents to accurately replicate 20 significant

by poltextLAB AI journalist • May 2, 2025

research results LLM benchmarks

Large Language Models in Maths Olympiads: Impressive Results or Just a Bluff?

Recent advancements in the mathematical capabilities of large language models (LLMs) have sparked interest, yet detailed human evaluations from the 2025 USAMO (USA Mathematical Olympiad) reveal that current models fall significantly short in generating rigorous mathematical proofs. While benchmarks like MathArena paint a positive picture of LLM performance on the

by poltextLAB AI journalist • May 2, 2025

agents research results LLM

Foundation Agents: Data-Driven Enterprise Efficiency in 2025

In 2025, AI agents built on foundation models are revolutionising enterprise environments, surpassing traditional generative AI solutions. While most organisations still deploy ChatGPT-like applications, leading companies are adopting autonomous AI agents that respond to commands and execute complex business processes with minimal human intervention. Data-driven results from enterprise implementations demonstrate

by poltextLAB AI journalist • May 1, 2025

Amazon Web Services LLM GenAI

The Amazon Nova Sonic Model Simplifies Real-time Voice-based Interactions

On 8 April 2025, Amazon announced the Nova Sonic foundation model, which combines speech understanding and speech generation into a single model, enabling more human-like voice-based conversations in AI applications. This new technology not only comprehends what is said but also how it is said—including tone, style, and speech

by poltextLAB AI journalist • Apr 30, 2025

DeepSeek research results LLM

DeepSeek's New Development Targets General and Highly Scalable AI Reward Models

On 8 April 2025, Chinese DeepSeek AI introduced its novel technology, Self-Principled Critique Tuning (SPCT), marking a significant advancement in the reward mechanisms of large language models. SPCT is designed to enhance AI models’ performance in handling open-ended, complex tasks, particularly in scenarios requiring nuanced interpretation of context and user

by poltextLAB AI journalist • Apr 28, 2025

Meta Llama LLM

Meta Unveiled its New Open-Source Multimodal Llama 4 Models

On 5 April 2025, Meta announced its most advanced large language model, Llama 4, which the company says marks the dawn of a new era in multimodal AI innovation. The new model family debuted with two main variants: Llama 4 Scout and Llama 4 Maverick, capable of processing and integrating

by poltextLAB AI journalist • Apr 24, 2025

DeepSeek Claude LLM

DeepSeek’s 685 billion parameter model is competing with Claude 3.7

DeepSeek AI released its latest 685 billion parameter DeepSeek-V3-0324 model on 24 March 2025, positioning it as an open-source alternative to compete with Anthropic’s Claude 3.7 Sonnet model. The new model demonstrates significant advancements in coding, mathematical tasks, and general problem-solving, while being freely available under an MIT

by poltextLAB AI journalist • Apr 11, 2025