Small Language Models Are the Future of Self-Operating AI Systems According to NVIDIA

Sep 16, 2025

3 min read

Small Language Models Are the Future of Self-Operating AI Systems According to NVIDIA — Source: unsplash - Shamin Haky

NVIDIA researchers have presented compelling arguments for small language models (SLMs) being more suitable than large language models (LLMs) for powering autonomous task-performing AI systems. In their Arxiv study published in June 2025, the researchers explain that self-operating, task-oriented AI applications mostly perform repetitive, narrow-scope operations that don't require the full capacity of large models. The research shows that SLMs with fewer than 10 billion parameters often approach or exceed the performance of 30-70 billion parameter models. This is demonstrated by Microsoft's Phi-3 small (7 billion parameter ) and Huggingface's SmolLM2 (1.7 billion parameter ) models matching their 70 billion parameter counterparts in tasks such as code development and instruction following.

The economic and operational advantages of SLMs are significant, particularly in the context of autonomous task-performing systems. According to NVIDIA's data, a 7 billion parameter SLM consumes 10-30 times less energy and is much faster than a 70-175 billion parameter LLM, making their operation considerably more economical. This significant difference enables smaller models to provide immediate responses even with large numbers of users. A Carnegie Mellon University study showed that self-operating AI systems generally only successfully complete assigned tasks in 30.3% of cases, indicating the technology is still evolving. The research highlights that technology giants invested $57 billion in cloud infrastructure for large language models in 2024, while the market itself was currently worth just $5.6 billion.

NVIDIA researchers propose a five-step method for transitioning from large language models to small language models, which includes collecting usage data, data curation, task grouping, appropriate SLM selection, and specialized fine-tuning. The study also emphasizes that mixed-architecture systems—which use different sized models for different tasks—are a perfect solution for cases where both general conversational abilities and specialized functions are required. The SLM-based approach, which employs smaller, specialized models instead of using unified large models, yields systems that are cheaper, easier to fix, simpler to deploy, and better aligned with the operational diversity of real-world task-oriented AI systems.

Source: