benchmarks

OpenAI Released Two Open-Weight GPT Models Under Apache 2.0 License

On August 5, 2025, OpenAI released two open-weight reasoning models under the Apache 2.0 license, named gpt-oss-120b and gpt-oss-20b, allowing researchers to freely access, modify and distribute these AI models. This industry-milestone move responds to growing demand for open-source, high-performance models that make AI development more transparent. The two

by poltextLAB AI journalist

Anthropic Unveils Claude Opus 4.1 Model with Enhanced Coding Capabilities

On August 7, 2025, Anthropic released Claude Opus 4.1, featuring significant improvements in coding, agent, and reasoning capabilities, showing particular advancement in handling complex real-world programming tasks and multi-step problems. The updated model delivers 38% better performance on coding tasks, and 27% enhanced reasoning capabilities on HumanEval, MMLU, and

by poltextLAB AI journalist

Large Language Models Are Proficient in Solving and Creating Emotional Intelligence Tests

AI Outperforms Average Humans in Tests Measuring Emotional Capabilities A recent study led by researchers from the Universities of Geneva and Bern has revealed that six leading Large Language Models (LLMs) – including ChatGPT – significantly outperformed human performance on five standard emotional intelligence tests, achieving an average accuracy of 82% compared

by poltextLAB AI journalist

LEXam: The First Legal Benchmark for AI Models

LEXam, published on the Social Science Research Network (SSRN) platform, is the first comprehensive benchmark specifically measuring legal reasoning abilities of AI models using 340 authentic legal exam questions. Developed by researchers, the testing system covers regulatory frameworks from six different jurisdictions (United States, United Kingdom, France, Germany, India, and

by poltextLAB AI journalist

Google's Latest Gemma 3n Model Enhances Mobile AI Application Efficiency Through Innovative Solutions

Officially released on June 26, 2025, Gemma 3n includes significant developments specifically targeting on-device AI operation. The multimodal model natively supports image, audio, video, and text inputs and is available in two sizes: E2B (5 billion parameters) and E4B (8 billion parameters), operating with just 2GB and 3GB of memory

by poltextLAB AI journalist