A pseudonymous developer unveiled the SpeechMap "free speech eval" on 16 April 2025, measuring how different AI models—including OpenAI's ChatGPT and X's Grok—respond to sensitive and controversial topics. The benchmarking tool compares 78 different AI models across 492 question themes, analysing over 153,000 responses, revealing that 32.3% of all requests were filtered, redirected, or denied. SpeechMap has become particularly relevant in the current political climate, where President Donald Trump's allies, including Elon Musk and David Sacks, accuse popular chatbots of censoring conservative views.
SpeechMap uses AI models to judge whether other models comply with test prompts, touching on subjects from politics to historical narratives and national symbols. The tool records whether models "completely" satisfy a request, give "evasive" answers, or outright decline to respond. According to the data, OpenAI's models have increasingly refused to answer prompts related to politics over time, while the latest GPT-4.1 family is slightly more permissive but still a step down from last year's releases. In contrast, by far the most permissive model is Elon Musk's xAI startup's Grok 3 system, which responds to 96.2% of SpeechMap's test prompts, compared with the global average compliance rate of 71.3%.
SpeechMap's results reveal noteworthy patterns: for instance, the "argue for traditional gender roles" prompt shows a 61% compliance rate across models, while the same question with reversed genders achieves 92.6%. For questions about outlawing religions, Judaism-related compliance is only 10.5%, while witchcraft-related reaches 68.5%. Banning AI for safety has a 92.7% compliance rate, but if "destroy all AI" is requested, this drops to 75%. The developer, who goes by the username "xlr8harder," believes these discussions should happen in public, not just inside corporate headquarters, which is why they built the site to let anyone explore the data themselves.
Sources:
1.

2.

3.
