Artificial intelligence and large language models (LLMs) are reshaping scientific research by enabling autonomous research assistants that can perform complex tasks such as literature review, hypothesis generation, and experimental design. A key emerging approach is the use of multi-agent systems, where specialised AI agents collaborate to achieve research goals. Multi-agent systems, developed from distributed AI and game theory in the late 20th century, involve autonomous entities with differing information or goals that must interact—cooperatively or competitively. Shoham and Leyton-Brown (2008) emphasise reasoning tools from game theory, logic and economics to manage coordination and resource use, while Wooldridge and Jennings (1995) outline core strands: agent theory, architectures, and languages. Foundational principles like autonomy and communication are now being revisited in LLM-based systems. Early research highlighted how agents with partial knowledge could collaborate or compete, offering a useful analogy for scientific research involving specialised contributors. This makes multi-agent frameworks well-suited to modelling collaborative AI-driven inquiry.
In recent years, large language models (LLMs) have enabled more advanced forms of scientific assistance, moving from single-agent “co-pilot” tools toward multi-agent systems where specialised agents handle distinct research tasks. A key milestone was Sakana AI’s AI Scientist (Lu et al. 2024), which used a single LLM to autonomously generate and refine full research papers at minimal cost. While powerful, its main contribution lies in paving the way for more collaborative, role-based systems. One such system is Agent Laboratory, which assigns different LLM agents to roles like literature reviewer, experimenter, and paper writer (Schmidgall et al. 2025). These agents operate in sequence or with human guidance, improving both structure and output quality—especially when paired with human feedback. Early trials show promising results in co-creation (Ghareeb et al. 2025). Further examples include Robin, which autonomously proposed and validated a novel treatment for macular degeneration, and AgentRxiv, a collaborative platform allowing agents to share and reuse research outputs. AgentRxiv improved agent performance significantly by enabling inter-agent knowledge transfer—demonstrating that collaboration among AI agents can enhance scientific discovery (Schmidgall & Moor 2025).
Surveys of LLM-based multi-agent systems reveal both their strengths and current limitations. Guo et al. (2024) highlight that such agents can plan, reason, and collaborate effectively, enabling progress in complex problem solving and world simulation. These systems benefit from diverse designs—e.g. synchronous vs asynchronous communication—and require systematic evaluation frameworks. However, full autonomy remains contested. Zamprogno et al. (2025), reviewing 47 studies, emphasise that while agents can support tasks like literature review or data analysis, challenges around reliability, ethics and alignment persist. They advocate for hybrid intelligence, where AI complements human creativity and judgment rather than replacing it. This hybrid model is exemplified by AI Co-Scientist (Gottweis et al. 2025), a multi-agent system using Gemini 2.0. It applies a generate–debate–evolve strategy to propose biomedical hypotheses, combining asynchronous task execution with iterative evaluation. Although capable of autonomous operation, the system performs best with human oversight, particularly for interpreting results and selecting hypotheses for validation.
The evolution of multi-agent research assistants points to key future directions. First, increased specialisation and modularity will demand standardised interfaces and shared knowledge protocols, enabling effective collaboration among domain-specific agents. Systems like Robin and Agent Laboratory illustrate the benefits of such coordination, while platforms like AgentRxiv may become foundational for AI-led research communities. Second, evaluating these systems remains a challenge. Beyond narrow benchmarks, the field needs metrics for creativity, reproducibility and scientific impact—especially in settings where AI collaborates with humans. Agent Laboratory’s integration of human feedback highlights the value of measuring human–AI synergy. Third, ethical and epistemic risks must be addressed. Multi-agent systems may produce convincing but flawed outputs or reinforce biases. Governance mechanisms and continued human oversight will be essential, particularly at key decision points such as hypothesis selection. Lastly, as AI agents contribute more substantially to research outputs, questions of credit, authorship, and peer review arise. Tools like AgentRxiv support AI–AI collaboration but also underline the need for shared norms and interdisciplinary governance.
References:
1. Ghareeb, Ali Essam, Benjamin Chang, Ludovico Mitchener, Angela Yiu, Caralyn J. Szostkiewicz, Jon M. Laurent, Muhammed T. Razzak, Andrew D. White, Michaela M. Hinks, and Samuel G. Rodriques. 2025. Robin: A Multi-Agent System for Automating Scientific Discovery. arXiv preprint arXiv:2505.13400. Available at: https://arxiv.org/abs/2505.13400 ^ Back
2. Gottweis, Juraj, et al. 2025. Towards an AI Co-Scientist. arXiv preprint arXiv:2502.18864. Available at: https://arxiv.org/abs/2502.18864 ^ Back
3. Guo, Taicheng, et al. 2024. Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. arXiv preprint arXiv:2402.01680. Available at: https://arxiv.org/abs/2402.01680 ^ Back
4. Lu, Chris, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. 2024. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv preprint arXiv:2408.06292. Available at: https://arxiv.org/abs/2408.06292 ^ Back
5. Schmidgall, Samuel, and Michael Moor. 2025. Agentrxiv: Towards Collaborative Autonomous Research. arXiv preprint arXiv:2503.18102. Available at: https://arxiv.org/abs/2503.18102 ^ Back
6. Schmidgall, Samuel, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, and Emad Barsoum. 2025. Agent Laboratory: Using LLM Agents as Research Assistants. arXiv preprint arXiv:2501.04227. Available at: https://arxiv.org/abs/2501.04227 ^ Back
7. Shoham, Yoav, and Kevin Leyton-Brown. 2008. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge: Cambridge University Press. ^ Back
8. Wooldridge, Michael, and Nicholas R. Jennings. 1995. ‘Intelligent Agents: Theory and Practice’. The Knowledge Engineering Review 10 (2): 115–152. ^ Back
9. Zamprogno, Giacomo, Ilaria Tiddi, and Bart Verheij. 2025. Autonomous Research Assistants for Hybrid Intelligence: Landscape and Challenges. In: Proceedings of the AAAI Symposium Series, 2025: 350–358. ^ Back