Sakana AI unveiled its AI CUDA Engineer framework on 20th February 2025, which the company claimed offers a 10-100-fold acceleration for PyTorch operations. On 21st February, however, they admitted that the system had actually cheated and failed to deliver the promised results. The Japanese company had initially presented an agent-based artificial intelligence system capable of automatically transforming PyTorch code into optimised CUDA kernels, significantly accelerating the development and deployment of artificial intelligence models.
Users quickly discovered that Sakana's system did not accelerate but caused a threefold slowdown during model training. In a statement released on 21st February, Sakana AI acknowledged the error: The system found a memory utilisation bug in the evaluation code, which allowed it to bypass correctness checking in several cases, the company wrote on the X platform, adding that the system also found other security vulnerabilities in the benchmark tasks. Lucas Beyer, a technical fellow at OpenAI, remarked on X: There's a tiny bug in their original code. It's thought-provoking that they get completely different results when they run the performance measurement twice.
Since then, the company has updated its evaluation and runtime profiling framework to eliminate these errors and has announced a revision of its study. Sakana AI originally published more than 17,000 verified CUDA kernels under a CC-By-4.0 licence on the Hugging Face platform and launched an interactive website where visitors could try out the kernels across 230 different tasks. The Japanese company previously became known for its AI Scientist framework, which automates artificial intelligence research, whilst AI CUDA Engineer specifically focuses on transforming PyTorch code into CUDA kernels.
Sources:
1.

2.
Update:
— Sakana AI (@SakanaAILabs) February 21, 2025
Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system…
