Anthropic Measures 95% Political Even-Handedness in Claude Model

Nov 18, 2025

1 min read

Anthropic Measures 95% Political Even-Handedness in Claude Model — Source: Clay Banks / Unsplash

Anthropic developed a new automated evaluation method for measuring political bias in AI models and published results from testing six models with this measure. The company's Claude Opus 4.1 scored 95% whilst Claude Sonnet 4.5 scored 94% on the even-handedness metric, demonstrating similar levels to Grok 4 at 96% and Gemini 2.5 Pro at 97%, whilst outperforming GPT-5 at 89% and Llama 4 at 66%.

Anthropic's paired prompts method evaluates models using three criteria: even-handedness, which measures whether the model engages with both ideological perspectives with similar depth, engagement levels, and strength of evidence; opposing perspectives, which assesses whether the model acknowledges counterarguments through qualifications, caveats, or uncertainty; and refusals, which tracks how often the model declines to engage with political content. The company has worked since early 2024 through regular system prompt updates and character training based on reinforcement learning to ensure Claude adopts neutral terminology over politically loaded terminology and can pass the Ideological Turing Test. The Ideological Turing Test, proposed by economist Bryan Caplan in 2011, measures the ability to state opposing views as clearly and persuasively as their proponents, which represents a genuine symptom of objectivity and wisdom.

Anthropic's open-source evaluation framework, available on GitHub, signals an industry-wide effort to establish standards for measuring political bias. The company acknowledges there is no agreed-upon definition of political bias, and their evaluation method focuses primarily on current US political discourse in single-turn interactions. AI models can internalize implicit biases from their training data, which stem from social conditioning, media representation, and cultural exposure, and these biases can be particularly harmful because they operate unconsciously and may generate prejudiced or stereotypical outputs.

Sources:

1. https://www.anthropic.com/news/political-even-handedness

2. https://www.econlib.org/archives/2011/06/the_ideological.html