Publications | Yao Zhang

2025

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Yao Zhang , Chenyang Lin , Shijie Tang , and 4 more authors

EMNLP 2025 (Main)

Abs arXiv Bib Code

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated multi-agent generation.
@misc{zhang2025swarmagenticfullyautomatedagentic, title = {SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence}, author = {Zhang, Yao and Lin, Chenyang and Tang, Shijie and Chen, Haokun and Zhou, Shijie and Ma, Yunpu and Tresp, Volker}, year = {2025}, eprint = {2506.15672}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.15672}, topic = {system}, }
GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning

Yao Zhang , Yu Wu , Haowei Zhang , and 6 more authors

NeurIPS 2025 Workshop LAW

Abs arXiv Bib Code

Process Reward Models (PRMs) aim to improve multi-step reasoning in Large Language Models (LLMs) by supervising intermediate steps and identifying errors throughout the reasoning process. However, building effective PRMs remains challenging due to the lack of scalable, high-quality annotations. Existing approaches rely on costly human labeling, LLM-based self-evaluation that is prone to hallucination, or Monte Carlo (MC) estimation, which infers step quality solely from rollout outcomes and often introduces noisy, misaligned supervision due to credit misattribution. These issues result in three core limitations: noisy rewards, low factual fidelity, and misalignment with step-level reasoning objectives. To address these challenges, we introduce GroundedPRM, a tree-guided and fidelity-aware framework for automatic process supervision. To reduce reward noise and enable fine-grained credit assignment, we construct structured reasoning paths via Monte Carlo Tree Search (MCTS). To eliminate hallucinated supervision, we validate each intermediate step using an external tool, providing precise, execution-grounded correctness signals. To combine both step-level validation and global outcome assessment, we design a hybrid reward aggregation mechanism that fuses tool-based verification with MCTS-derived feedback. Finally, we format the reward signal into a rationale-enhanced, generative structure to promote interpretability and compatibility with instruction-tuned LLMs. GroundedPRM is trained on only 40K automatically labeled samples, amounting to just 10% of the data used by the best-performing PRM trained with auto-labeled supervision. Nevertheless, it achieves up to a 26% relative improvement in average performance on ProcessBench. When used for reward-guided greedy search, GroundedPRM outperforms even PRMs trained with human-labeled supervision, offering a scalable and verifiable path toward high-quality process-level reasoning.
@misc{zhang2025groundedprmtreeguidedfidelityawareprocess, title = {GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning}, author = {Zhang, Yao and Wu, Yu and Zhang, Haowei and Li, Weiguo and Chen, Haokun and Wu, Jingpei and Li, Guohao and Han, Zhen and Tresp, Volker}, year = {2025}, eprint = {2510.14942}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2510.14942}, topic = {reasoning}, }
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang , Zijian Ma , Yunpu Ma , and 3 more authors

AAAI 2025

Abs arXiv Bib Code

LLM-based autonomous agents often fail in executing complex web tasks that require dynamic interaction, largely due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies and actions based on new observations, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with the vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks, continuously refining this plan through reflective analysis of new observations and previous subtask attempts, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information by iteratively refining decisions based on new observations. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.
@misc{zhang2024webpilotversatileautonomousmultiagent, title = {WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration}, author = {Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker}, year = {2025}, eprint = {2408.15978}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2408.15978}, topic = {system}, }
Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

Xiaolong Ma , Chenyang Lin , Yao Zhang , and 1 more author

Under Review, 2025

Abs arXiv Bib

We propose Agentic Neural Networks, a framework that enables neural networks to evolve their own architectures through textual backpropagation. By reformulating network updates as text-based operations, we enable self-evolving multi-agent systems that can adaptively modify their structure and coordination mechanisms.
@misc{ma2025agenticneuralnetworksselfevolving, title = {Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation}, author = {Ma, Xiaolong and Lin, Chenyang and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2506.09046}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.09046}, }
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering

Yao Zhang , Haokun Chen , Aymen Frikha , and 1 more author

WACV 2025

Abs arXiv Bib

We introduce CL-CrossVQA, a benchmark for continual learning in cross-domain visual question answering. The benchmark evaluates the ability of vision-language models to retain knowledge while adapting to new domains, highlighting challenges in representation retention and cross-domain generalization.
@misc{zhang2025clcrossvqacontinuallearningbenchmark, title = {CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering}, author = {Zhang, Yao and Chen, Haokun and Frikha, Aymen and Tresp, Volker}, year = {2025}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} }
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models

Yao Zhang , Hewei Gao , Haokun Chen , and 1 more author

Under Review, 2025

Abs arXiv Bib

FedNano proposes a lightweight federated tuning framework for pretrained multimodal large language models. The method drastically reduces client-side computational cost while maintaining strong reasoning and adaptation performance, enabling scalable and privacy-preserving multimodal intelligence.
@misc{zhang2025fednanotowardlightweightfederated, title = {FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models}, author = {Zhang, Yao and Gao, Hewei and Chen, Haokun and Tresp, Volker}, year = {2025}, eprint = {2506.14824}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.14824}, topic = {learning} }
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Haokun Chen , Haokun Li , Yao Zhang , and 1 more author

CVPR 2025

Abs arXiv Bib

FedBiP introduces a heterogeneous one-shot federated learning framework that leverages personalized latent diffusion models. The approach enables efficient model aggregation in federated settings with heterogeneous data distributions while preserving client-specific personalization.
@misc{chen2025fedbipheterogeneousoneshotfederated, title = {FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models}, author = {Chen, Haokun and Li, Haokun and Zhang, Yao and Tresp, Volker}, year = {2025}, }
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models

Haokun Chen , Jingpei Li , Yao Zhang , and 1 more author

Under Review, 2025

Abs arXiv Bib

AUVIC introduces an adversarial unlearning framework for removing specific visual concepts from multi-modal large language models. The approach enables selective concept removal while maintaining model performance on other tasks.
@misc{chen2025auvicadversarialunlearningvisual, title = {AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models}, author = {Chen, Haokun and Li, Jingpei and Zhang, Yao and Tresp, Volker}, year = {2025}, }
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs

Guangyi Zhang , Mingshi Ding , Tianli Liu , Yao Zhang , and 1 more author

ICLR 2025 Workshop World Models

Abs arXiv Bib

This work investigates how multimodal large language models handle streaming events in videos. We find that while memory mechanisms help with temporal understanding, confabulation can mislead model predictions, highlighting the need for improved temporal reasoning in video-language models.
@misc{zhang2025memoryhelpsconfabulationmisleads, title = {Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs}, author = {Zhang, Guangyi and Ding, Mingshi and Liu, Tianli and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2502.15457}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2502.15457}, author_limit = {4} }
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs

Haokun Chen , Yao Zhang , Yubo Bi , and 2 more authors

NeurIPS 2025 Workshop LockLLM

Abs arXiv Bib

We propose a framework for auditing machine unlearning in large language models. The framework evaluates whether unlearning truly removes target knowledge while maintaining model utility, providing tools to assess the effectiveness of unlearning methods.
@misc{chen2025doesmachineunlearningtrulyremove, title = {Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs}, author = {Chen, Haokun and Zhang, Yao and Bi, Yubo and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2505.23270}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2505.23270}, }

2024

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen , Yao Zhang , Daniel Krompass , and 1 more author

AAAI 2024

Abs arXiv Bib

FedDAT presents an approach for foundation model finetuning in multi-modal heterogeneous federated learning. The method addresses challenges in federated adaptation of large foundation models across diverse data modalities and client distributions.
@misc{chen2024feddatapproachfoundationmodel, title = {FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning}, author = {Chen, Haokun and Zhang, Yao and Krompass, Daniel and Tresp, Volker}, year = {2024}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} }

2023

Adaptive Multi-Resolution Attention with Linear Complexity

Yao Zhang , Yunpu Ma , Thomas Seidl , and 1 more author

In International Joint Conference on Neural Networks

IJCNN 2023 (Oral)

Abs arXiv Bib

We propose an adaptive multi-resolution attention mechanism that achieves linear complexity. The method dynamically adjusts resolution levels based on input characteristics, enabling efficient processing of long sequences while maintaining attention quality.
@inproceedings{zhang2023adaptivemultiresolutionattention, title = {Adaptive Multi-Resolution Attention with Linear Complexity}, author = {Zhang, Yao and Ma, Yunpu and Seidl, Thomas and Tresp, Volker}, year = {2023}, booktitle = {International Joint Conference on Neural Networks}, }

2022

ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations

Zhen Han , Renke Liao , Jie Gu , Yao Zhang , and 1 more author

In Proceedings of the Association for Computational Linguistics

ACL 2022

Abs arXiv Bib

ECOLA enhances temporal knowledge embeddings by incorporating contextualized language representations. The approach improves temporal reasoning in knowledge graphs by leveraging pre-trained language models to capture temporal context.
@inproceedings{han2022ecolaenhancedtemporalknowledge, title = {ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations}, author = {Han, Zhen and Liao, Renke and Gu, Jie and Zhang, Yao and Tresp, Volker}, year = {2022}, booktitle = {Proceedings of the Association for Computational Linguistics}, author_limit = {4} }

2021

Argument Mining Driven Analysis of Peer-Reviews

Maximilian Fromm , Evgeniy Faerman , Max Berrendorf , Shrestha Bhargava , Ruoyu Qi , Yao Zhang , and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence

AAAI 2021

Abs arXiv Bib

We present an argument mining approach for analyzing peer-review texts. The method extracts and structures arguments from review comments, enabling automated analysis of review quality and feedback patterns.
@inproceedings{fromm2021argumentminingdrivenanalysis, title = {Argument Mining Driven Analysis of Peer-Reviews}, author = {Fromm, Maximilian and Faerman, Evgeniy and Berrendorf, Max and Bhargava, Shrestha and Qi, Ruoyu and Zhang, Yao and Tresp, Volker}, year = {2021}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, author_limit = {6} }

2020

KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection

Yao Zhang , Yunpu Lu , and Thomas Seidl

In International Conference on Information Integration and Web-based Applications and Services

IIWAS 2020

Abs arXiv Bib

KNNAC introduces an efficient k-nearest neighbor based clustering algorithm with active core detection. The method identifies cluster cores through active learning, improving clustering efficiency and accuracy.
@inproceedings{zhang2020knnacefficientknearest, title = {KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection}, author = {Zhang, Yao and Lu, Yunpu and Seidl, Thomas}, year = {2020}, booktitle = {International Conference on Information Integration and Web-based Applications and Services}, }
k-Nearest Neighbor Based Clustering with Shape Alternation Adaptivity

Yunpu Lu , Yao Zhang , Florian Richter , and 1 more author

In International Joint Conference on Neural Networks

IJCNN 2020 (Oral)

Abs arXiv Bib

We propose a k-nearest neighbor based clustering method with shape alternation adaptivity. The approach adapts to varying cluster shapes and densities, improving clustering performance on diverse datasets.
@inproceedings{lu2020knearestneighborbasedclustering, title = {k-Nearest Neighbor Based Clustering with Shape Alternation Adaptivity}, author = {Lu, Yunpu and Zhang, Yao and Richter, Florian and Seidl, Thomas}, year = {2020}, booktitle = {International Joint Conference on Neural Networks}, }