Chinese technology giant Baidu unveiled its natively omni-modal AI model ERNIE 5.0 at the Baidu World 2025 event, mere hours after OpenAI updated its flagship model to GPT-5.1. In public benchmark slides, ERNIE 5.0 Preview outperformed or matched OpenAI's GPT-5-High and Google's Gemini 2.5 Pro in multimodal reasoning, document understanding, and image-based question answering. Baidu claims the model beat both competitors on OCRBench, DocVQA, and ChartQA benchmarks, which test document recognition, comprehension, and structured data reasoning—areas the company describes as core to enterprise applications such as automated document processing and financial analysis. ERNIE 5.0 is available only via Baidu's ERNIE Bot website and the Qianfan cloud platform API.
The model achieved leading scores on visual tasks, whilst demonstrating competitive results on audio understanding benchmarks MM-AU and TUT2017 for audio and speech tasks. Two days before the flagship ERNIE 5.0 event, Baidu released the open-source multimodal model ERNIE-4.5-VL-28B-A3B-Thinking under the Apache 2.0 licence, which activates just 3 billion parameters during operation whilst maintaining 28 billion total parameters through a Mixture-of-Experts architecture.
The launch of ERNIE 5.0 signals Baidu's repositioning for global competitiveness in the enterprise AI market, having historically focused primarily on domestic markets. Baidu CEO Robin Li stated that 'when you internalize AI, it becomes a native capability and transforms intelligence from a cost into a source of productivity'. The company's Apollo Go autonomous ride-hailing service has already surpassed 17 million rides across 22 cities, operating as the world's largest robotaxi network, whilst its digital human platform, already rolled out in Brazil, was used by 83% of livestreamers during this year's Double 11 shopping event in China, contributing to a 91% increase in gross merchandise value.
Sources: