GenAI textbook Part 33 Chapter 2 What Are AI Agents and How Do They Work?

Types of AI Agents

Jul 8, 2025

5 min read

Types of AI Agents — Source: Freepik - lishchyshyn

Artificial intelligence (AI) agents are broadly defined as computational entities that perceive their environment and act autonomously to achieve specific goals (Russell & Norvig 2020). Foundational work in the field, dating back to its early formalisation, characterises intelligent agents by key properties such as autonomy, reactivity, proactiveness, and social ability, which collectively define their capacity for intelligent behaviour (Wooldridge & Jennings 1995). This theoretical groundwork established a framework for understanding how agents operate. Within classical AI, a highly influential taxonomy proposed by Russell and Norvig (2020) groups agents into classes of increasing sophistication, from simple reflex agents to complex learning agents. More recently, however, the advent of large-scale machine learning has introduced new agent paradigms. This shift has been profoundly influenced by architectural breakthroughs like the Transformer model (Vaswani et al. 2017), which underpins the large language models (LLMs) at the heart of contemporary agents. These modern systems are often distinguished by their ability to handle multiple data modalities, utilise external software tools, and operate within collaborative multi-agent systems (Masterman et al. 2024). This essay provides an overview of this evolution, from the classical agent types to the modern, LLM-driven architectures that are shaping the future of AI.

The most elementary agent type is the simple reflex agent, which operates on a straightforward condition-action principle. It selects actions based solely on the current percept, ignoring any perceptual history, by implementing a set of predefined rules that map observed conditions directly to responses. While computationally efficient in fully observable environments, such as a basic thermostat, this purely reactive behaviour is brittle; in complex or partially observable settings, it can lead to incorrect actions or infinite loops. To overcome these limitations, model-based reflex agents maintain an internal state that functions as a model of the world, updated by the percept history. This internal model allows the agent to handle partial observability by encoding unobserved aspects of the current situation, such as remembering the last known location of an object that is currently out of sight, enabling more informed decision-making (Russell & Norvig 2020).

A further level of sophistication is introduced with goal-based agents, which move beyond reactive behaviour by incorporating an explicit representation of goals, or desirable states. In addition to a world model, these agents use search and planning algorithms to find sequences of actions that will lead to their goal states. This makes them far more flexible than reflex agents, as they can adapt their plans if circumstances change. Decision-making can be refined even further with utility-based agents, which employ a utility function to evaluate the desirability of different world states. Instead of a binary goal, the agent has a quantitative performance measure it seeks to maximise, allowing it to handle trade-offs between multiple, often conflicting, objectives, such as speed versus safety in an autonomous vehicle (Russell & Norvig 2020).

The learning agent represents a significant leap, as it is designed to improve its performance over time by learning from experience (Russell & Norvig 2020). A learning agent contains a 'learning element' that uses feedback on its actions to modify its 'performance element', thereby adapting to new or unknown environments. Reinforcement learning is a key paradigm here, in which an agent refines its policies to maximise a cumulative reward signal (Sutton & Barto 2018). The success of DeepMind's AlphaGo, which learned superhuman strategies through self-play, powerfully demonstrated the potential of learning agents to surpass their initial programming and even human expertise (Silver et al. 2016).

The most recent and profound shift in agent design has been driven by the emergence of LLM-based agents. These leverage the advanced reasoning and generation capabilities of large language models, which are themselves built upon powerful neural architectures like the Transformer (Vaswani et al. 2017), as a core decision-making engine (Deng et al. 2025). An LLM-based agent can interpret high-level natural language instructions, decompose complex tasks using reasoning techniques like chain-of-thought (Wei et al. 2022), and execute multi-step plans in an interactive loop of planning, acting, and observing (Yao et al. 2023). Unlike a reactive chatbot, these agents exhibit proactivity and autonomy, pursuing long-horizon goals with minimal continuous user input (Kolt 2025). As real-world tasks often involve more than just text, multimodal agents have become a critical area of development. These agents are capable of processing and generating information across multiple modalities, including images, audio, and video, by combining LLMs with specialised perception models. This allows them to tackle more intricate and nuanced tasks, such as analysing a diagram and producing a textual summary (Xie et al. 2024).

Building on these capabilities, collaborative multi-agent systems address problems that are too complex for a single agent. In these systems, a task is distributed among multiple, often specialised, agents that work together as a team (Masterman et al. 2024). This concept draws on decades of research into multi-agent systems, where coordination and communication are paramount (Wooldridge & Jennings 1995). Modern frameworks may assign distinct roles—such as ‘planner’, ‘executor’, or ‘critic’—and rely on structured communication protocols to ensure coherent collaboration (Gao et al. 2024). To avoid inefficient "chatter," some systems like MetaGPT require agents to exchange structured outputs rather than unstructured messages (Masterman et al. 2024). Finally, tool-using agents overcome the inherent limitations of LLMs, such as static knowledge and an inability to interact with the outside world, by invoking external tools. These agents can call APIs, query databases, or run code to access real-time information and execute actions in digital environments (Masterman et al. 2024). Seminal work like Toolformer has shown that language models can be trained to seamlessly integrate such tool calls into their reasoning processes, effectively giving them the ability to act upon the world (Schick et al. 2023).

References:

1. Deng, Zehang, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. 2025. AI Agents under Threat: A Survey of Key Security Challenges and Future Pathways. ACM Computing Surveys 57 (7): 1–36. https://doi.org/10.1145/3643876 ^ Back

2. Gao, Shanghua, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. 2024. Empowering Biomedical Discovery with AI Agents. Cell 187 (22): 6125–6151. https://doi.org/10.1016/j.cell.2024.06.001 ^ Back

3. Kolt, Noam. 2025. Governing AI Agents. arXiv preprint arXiv:2501.07913. https://arxiv.org/abs/2501.07913 ^ Back

4. Masterman, Tula, Sandi Besen, Mason Sawtell, and Alex Chao. 2024. The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey. arXiv preprint arXiv:2404.11584. https://arxiv.org/abs/2404.11584 ^ Back

5. Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson Series in Artificial Intelligence. Pearson. https://www.pearson.com/.../9780134610993 ^ Back

6. Schick, Timo, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, and Luke Zettlemoyer et al. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv preprint arXiv:2302.04761. https://arxiv.org/abs/2302.04761 ^ Back

7. Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, and George van den Driessche et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529 (7587): 484–489. https://doi.org/10.1038/nature16961 ^ Back

8. Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. MIT Press. https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf ^ Back

9. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. ‘Attention Is All You Need’. arXiv. doi:10.48550/ARXIV.1706.03762 – ^ Back

10. Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, and Fei Xia et al. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903. https://arxiv.org/abs/2201.11903 ^ Back

11. Wooldridge, Michael, and Nicholas R. Jennings. 1995. Intelligent Agents: Theory and Practice. The Knowledge Engineering Review 10 (2): 115–152. https://doi.org/10.1017/S0269888900008122 ^ Back

12. Xie, Junlin, Zhihong Chen, Ruifei Zhang, Xiang Wan, and Guanbin Li. 2024. Large Multimodal Agents: A Survey. arXiv preprint arXiv:2402.15116. https://arxiv.org/abs/2402.15116 ^ Back

13. Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, and Karthik Narasimhan et al. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629. https://doi.org/10.48550/arXiv.2210.03629 ^ Back