On 25 September 2025, OpenAI introduced GDPval, measuring AI models on economically valuable, real-world tasks across 44 occupations in nine sectors that each contribute over 5% to U.S. GDP. The aim is to move from synthetic exams to authentic work deliverables (e.g., legal briefs, engineering blueprints, nursing care plans).
GDPval-v0 comprises 1,320 tasks (220 in the open “gold” set) authored by professionals averaging 14 years’ experience; occupations were selected using May 2024 BLS data and O*NET, focusing on roles with at least 60% knowledge work. Performance is graded blind by expert reviewers who compare and rank AI versus human outputs.
Early results on a 220-task subset show Claude Opus 4.1 matching or beating human experts 47.6% of the time, with GPT-5 close behind; frontier models complete tasks roughly 100× faster and cheaper, and performance more than doubled from GPT-4o to GPT-5.
Sources:
1.
2.
3.