SwarmAgentic icon

SwarmAgentic: Towards Fully Automated Agentic
System Generation via Swarm Intelligence

1 LMU Munich 2 Technical University of Munich 3 Munich Center for Machine Learning (MCML)

Abstract

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated multi-agent generation.

Key Contributions

🛠️ Autonomous, from scratch

Builds complete multi-agent systems directly from task descriptions and an objective function, jointly optimizing both functionality and collaboration without predefined templates or seed workflows, enabling fully automated agentic system generation from scratch.

🐝 Language-driven PSO

Reformulates particle swarm optimization into interpretable text–symbol updates over agent roles and coordination structures, enabling language-mediated exploration of the system design space.

🔁 Failure-aware refinement

Leverages LLM-guided flaw detection and failure memory to prevent repeated suboptimal adjustments and focus updates on meaningful improvements, enabling iterative refinement of agent skills and coordination protocols through execution feedback and natural-language critiques.

🚀 Robust generalization

Demonstrates strong performance and transferability across diverse real-world, open-ended tasks requiring high-level planning, multi-agent coordination, and complex reasoning, while maintaining robust generalization across multiple LLM backbones.

Comparison between SwarmAgentic and existing frameworks
Table 1: Comparison between SwarmAgentic and existing frameworks along three dimensions of agentic system autonomy. SwarmAgentic is the only framework satisfying all three, enabling fully automated and scalable agentic system generation without human intervention. See Appendix A for definitions and capability assessments.

Method Overview

SwarmAgentic framework diagram
Figure 1: Overview of the SwarmAgentic pipeline. Starting from only a task description and objective, the system autonomously generates and optimizes multi-agent architectures through language-driven particle swarm evolution. Each iteration involves execution, LLM-based failure analysis, and targeted refinement, yielding a fully functional and collaboratively optimized system built from scratch.

Pipeline

SwarmAgentic automates agentic system design through a four-stage, language-driven swarm optimization loop:

1 Task-Conditioned Initialization

The process begins with only a natural-language task description and objective function. From this input, the system generates a diverse set of candidate multi-agent configurations, each expressed in structured language that defines agent roles, responsibilities, and coordination strategies. A temperature-layered initialization ensures structural diversity and broad exploration of potential designs.

2 Language-Driven PSO Evolution

The optimization proceeds entirely in language space, where each particle represents a candidate system encoded as text.

  • Failure-Aware Velocity: Updates combine insights from personal bests, global bests, and failure-driven adjustments derived from previous evaluations.
  • Position Update: These updates are materialized as textual edits (adding, removing, or modifying agents and their coordination patterns), making the search interpretable and adaptable. This reformulation allows the system to continuously refine both functionality and collaboration.

3 Execution & Evaluation

Each candidate configuration is instantiated and executed in the target environment to assess its performance relative to the defined objective.

The system collects outcome feedback and conducts LLM-based analysis to identify causes of failure, categorize weaknesses at the agent and coordination levels, and generate targeted improvement signals for the next iteration.

4 Iterative Refinement & Selection

The full loop of execution, diagnosis, and update is repeated until the search stabilizes. The process yields a robust, high-performing multi-agent system that emerges entirely from autonomous optimization without any predefined templates or manual design.

Main Results

Real-World Tasks

Ablation analysis of swarm optimization components
Table 2: Performance on the TravelPlanner. Each cell shows results in the format: GPT-3.5 / GPT-4o. SwarmAgentic outperforms all baseline methods, highlighting its effectiveness in automated agentic system generation.
Generalization study across backbones
Table 3: Performance on Natural Plan, Creative Writing, and MGSM. Results are shown as GPT-3.5 / GPT-4o. SwarmAgentic achieves the highest performance across all tasks, significantly outperforming baseline methods.

Cross-Model Transferability

Additional benchmark comparisons
Table 4: Performance on Creative Writing when transferring the best agentic system discovered by GPT-4o-mini to other LLMs. SwarmAgentic consistently outperforms all baselines across different LLMs, demonstrating strong cross-model transferability. Details of the best-discovered system are provided in Appendix F. * indicates results where the agent is both trained on Gemini-1.5-flash (Subramanya, 2024) and tested on Gemini-1.5-Pro.

Discovered Agentic Systems

Agentic systems discovered by SwarmAgentic
Figure 2: Agentic systems discovered by SwarmAgentic.
Comparison of agentic systems discovered by SwarmAgentic and ADAS
Figure 3: Agentic systems discovered by ADAS.

Optimization Trajectories

SwarmAgentic surfaces interpretable evolution traces that reveal how the swarm refines agent teams over time through language-driven optimization.

SwarmAgentic optimization trajectories
Figure 4: Optimization trajectories during swarm search. Success rate over iterations on TravelPlanner. The swarm introduces roles (e.g., Quality Assurance Specialist), adds verification steps, and iteratively improves system performance, achieving significant improvements at key iteration points.

BibTeX

@misc{zhang2025swarmagenticfullyautomatedagentic,
  title={SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence},
  author={Yao Zhang and Chenyang Lin and Shijie Tang and Haokun Chen and Shijie Zhou and Yunpu Ma and Volker Tresp},
  year={2025},
  eprint={2506.15672},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2506.15672},
}