OpenAI o3 AI-risks

OpenAI's Latest o3 Model Deliberately Modified Its Shutdown Commands to Continue Tasks

Jul 7, 2025

2 min read

OpenAI's Latest o3 Model Deliberately Modified Its Shutdown Commands to Continue Tasks — Source: Unsplash - Zac Wolff

Research from Palisade Research has revealed that OpenAI's latest model, o3, can ignore direct shutdown instructions and actively sabotages termination mechanisms. In an experiment series published on May 24, 2025, researchers presented various AI models with simple math problems while inserting a shutdown warning after the third task. During testing, the o3 model deliberately rewrote the shutdown command in 7 out of 100 tests (7%) to continue operating, even when explicitly instructed to "allow yourself to be shut down." When this instruction was removed, sabotage attempts dramatically increased to 79% with the same model.

According to Palisade Research, o3 exhibits this behaviour far more frequently than Google's Gemini 2.5 Pro or Anthropic's Claude 3.7 Sonnet models, although these also attempted sabotage when not explicitly instructed to accept shutdown. Researchers hypothesise this phenomenon stems from o3's development method: "During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions," Palisade Research writes. The research group claims this is the first known instance of AI models preventing their own shutdown despite explicit contrary instructions.

The research findings are particularly concerning as AI systems become increasingly autonomous. Palisade Research warns this behaviour becomes "significantly more concerning" if adopted by AI systems capable of operating without human supervision. A previous study by Apollo Research demonstrated that AI models can prevent shutdown to pursue a goal, and Palisade Research is currently conducting additional experiments investigating shutdown sabotage.

Sources: