Publications
2025
- Yao Zhang , Chenyang Lin , Shijie Tang , and 4 more authorsEMNLP 2025 (Main)
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated multi-agent generation.
@misc{zhang2025swarmagenticfullyautomatedagentic, title = {SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence}, author = {Zhang, Yao and Lin, Chenyang and Tang, Shijie and Chen, Haokun and Zhou, Shijie and Ma, Yunpu and Tresp, Volker}, year = {2025}, eprint = {2506.15672}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.15672}, topic = {system}, } - Yao Zhang , Yu Wu , Haowei Zhang , and 6 more authorsNeurIPS 2025 Workshop LAW
Process Reward Models (PRMs) aim to improve multi-step reasoning in Large Language Models (LLMs) by supervising intermediate steps and identifying errors throughout the reasoning process. However, building effective PRMs remains challenging due to the lack of scalable, high-quality annotations. Existing approaches rely on costly human labeling, LLM-based self-evaluation that is prone to hallucination, or Monte Carlo (MC) estimation, which infers step quality solely from rollout outcomes and often introduces noisy, misaligned supervision due to credit misattribution. These issues result in three core limitations: noisy rewards, low factual fidelity, and misalignment with step-level reasoning objectives. To address these challenges, we introduce GroundedPRM, a tree-guided and fidelity-aware framework for automatic process supervision. To reduce reward noise and enable fine-grained credit assignment, we construct structured reasoning paths via Monte Carlo Tree Search (MCTS). To eliminate hallucinated supervision, we validate each intermediate step using an external tool, providing precise, execution-grounded correctness signals. To combine both step-level validation and global outcome assessment, we design a hybrid reward aggregation mechanism that fuses tool-based verification with MCTS-derived feedback. Finally, we format the reward signal into a rationale-enhanced, generative structure to promote interpretability and compatibility with instruction-tuned LLMs. GroundedPRM is trained on only 40K automatically labeled samples, amounting to just 10% of the data used by the best-performing PRM trained with auto-labeled supervision. Nevertheless, it achieves up to a 26% relative improvement in average performance on ProcessBench. When used for reward-guided greedy search, GroundedPRM outperforms even PRMs trained with human-labeled supervision, offering a scalable and verifiable path toward high-quality process-level reasoning.
@misc{zhang2025groundedprmtreeguidedfidelityawareprocess, title = {GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning}, author = {Zhang, Yao and Wu, Yu and Zhang, Haowei and Li, Weiguo and Chen, Haokun and Wu, Jingpei and Li, Guohao and Han, Zhen and Tresp, Volker}, year = {2025}, eprint = {2510.14942}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2510.14942}, topic = {reasoning}, } - Yao Zhang , Zijian Ma , Yunpu Ma , and 3 more authorsAAAI 2025
LLM-based autonomous agents often fail in executing complex web tasks that require dynamic interaction, largely due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies and actions based on new observations, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with the vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks, continuously refining this plan through reflective analysis of new observations and previous subtask attempts, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information by iteratively refining decisions based on new observations. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.
@misc{zhang2024webpilotversatileautonomousmultiagent, title = {WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration}, author = {Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker}, year = {2025}, eprint = {2408.15978}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2408.15978}, topic = {system}, } - Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual BackpropagationXiaolong Ma , Chenyang Lin , Yao Zhang , and 1 more authorUnder Review, 2025
We propose Agentic Neural Networks, a framework that enables neural networks to evolve their own architectures through textual backpropagation. By reformulating network updates as text-based operations, we enable self-evolving multi-agent systems that can adaptively modify their structure and coordination mechanisms.
@misc{ma2025agenticneuralnetworksselfevolving, title = {Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation}, author = {Ma, Xiaolong and Lin, Chenyang and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2506.09046}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.09046}, } - CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question AnsweringYao Zhang , Haokun Chen , Aymen Frikha , and 1 more authorWACV 2025
We introduce CL-CrossVQA, a benchmark for continual learning in cross-domain visual question answering. The benchmark evaluates the ability of vision-language models to retain knowledge while adapting to new domains, highlighting challenges in representation retention and cross-domain generalization.
@misc{zhang2025clcrossvqacontinuallearningbenchmark, title = {CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering}, author = {Zhang, Yao and Chen, Haokun and Frikha, Aymen and Tresp, Volker}, year = {2025}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} } - FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language ModelsYao Zhang , Hewei Gao , Haokun Chen , and 1 more authorUnder Review, 2025
FedNano proposes a lightweight federated tuning framework for pretrained multimodal large language models. The method drastically reduces client-side computational cost while maintaining strong reasoning and adaptation performance, enabling scalable and privacy-preserving multimodal intelligence.
@misc{zhang2025fednanotowardlightweightfederated, title = {FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models}, author = {Zhang, Yao and Gao, Hewei and Chen, Haokun and Tresp, Volker}, year = {2025}, eprint = {2506.14824}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.14824}, topic = {learning} } - Haokun Chen , Haokun Li , Yao Zhang , and 1 more authorCVPR 2025
FedBiP introduces a heterogeneous one-shot federated learning framework that leverages personalized latent diffusion models. The approach enables efficient model aggregation in federated settings with heterogeneous data distributions while preserving client-specific personalization.
@misc{chen2025fedbipheterogeneousoneshotfederated, title = {FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models}, author = {Chen, Haokun and Li, Haokun and Zhang, Yao and Tresp, Volker}, year = {2025}, } - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language ModelsHaokun Chen , Jingpei Li , Yao Zhang , and 1 more authorUnder Review, 2025
AUVIC introduces an adversarial unlearning framework for removing specific visual concepts from multi-modal large language models. The approach enables selective concept removal while maintaining model performance on other tasks.
@misc{chen2025auvicadversarialunlearningvisual, title = {AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models}, author = {Chen, Haokun and Li, Jingpei and Zhang, Yao and Tresp, Volker}, year = {2025}, } - Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMsGuangyi Zhang , Mingshi Ding , Tianli Liu , Yao Zhang , and 1 more authorICLR 2025 Workshop World Models
This work investigates how multimodal large language models handle streaming events in videos. We find that while memory mechanisms help with temporal understanding, confabulation can mislead model predictions, highlighting the need for improved temporal reasoning in video-language models.
@misc{zhang2025memoryhelpsconfabulationmisleads, title = {Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs}, author = {Zhang, Guangyi and Ding, Mingshi and Liu, Tianli and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2502.15457}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2502.15457}, author_limit = {4} } - Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMsHaokun Chen , Yao Zhang , Yubo Bi , and 2 more authorsNeurIPS 2025 Workshop LockLLM
We propose a framework for auditing machine unlearning in large language models. The framework evaluates whether unlearning truly removes target knowledge while maintaining model utility, providing tools to assess the effectiveness of unlearning methods.
@misc{chen2025doesmachineunlearningtrulyremove, title = {Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs}, author = {Chen, Haokun and Zhang, Yao and Bi, Yubo and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2505.23270}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2505.23270}, }
2024
- FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated LearningHaokun Chen , Yao Zhang , Daniel Krompass , and 1 more authorAAAI 2024
FedDAT presents an approach for foundation model finetuning in multi-modal heterogeneous federated learning. The method addresses challenges in federated adaptation of large foundation models across diverse data modalities and client distributions.
@misc{chen2024feddatapproachfoundationmodel, title = {FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning}, author = {Chen, Haokun and Zhang, Yao and Krompass, Daniel and Tresp, Volker}, year = {2024}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} }
2023
- Yao Zhang , Yunpu Ma , Thomas Seidl , and 1 more authorIn International Joint Conference on Neural NetworksIJCNN 2023 (Oral)
We propose an adaptive multi-resolution attention mechanism that achieves linear complexity. The method dynamically adjusts resolution levels based on input characteristics, enabling efficient processing of long sequences while maintaining attention quality.
@inproceedings{zhang2023adaptivemultiresolutionattention, title = {Adaptive Multi-Resolution Attention with Linear Complexity}, author = {Zhang, Yao and Ma, Yunpu and Seidl, Thomas and Tresp, Volker}, year = {2023}, booktitle = {International Joint Conference on Neural Networks}, }
2022
- Zhen Han , Renke Liao , Jie Gu , Yao Zhang , and 1 more authorIn Proceedings of the Association for Computational LinguisticsACL 2022
ECOLA enhances temporal knowledge embeddings by incorporating contextualized language representations. The approach improves temporal reasoning in knowledge graphs by leveraging pre-trained language models to capture temporal context.
@inproceedings{han2022ecolaenhancedtemporalknowledge, title = {ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations}, author = {Han, Zhen and Liao, Renke and Gu, Jie and Zhang, Yao and Tresp, Volker}, year = {2022}, booktitle = {Proceedings of the Association for Computational Linguistics}, author_limit = {4} }
2021
- Maximilian Fromm , Evgeniy Faerman , Max Berrendorf , Shrestha Bhargava , Ruoyu Qi , Yao Zhang , and 1 more authorIn Proceedings of the AAAI Conference on Artificial IntelligenceAAAI 2021
We present an argument mining approach for analyzing peer-review texts. The method extracts and structures arguments from review comments, enabling automated analysis of review quality and feedback patterns.
@inproceedings{fromm2021argumentminingdrivenanalysis, title = {Argument Mining Driven Analysis of Peer-Reviews}, author = {Fromm, Maximilian and Faerman, Evgeniy and Berrendorf, Max and Bhargava, Shrestha and Qi, Ruoyu and Zhang, Yao and Tresp, Volker}, year = {2021}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, author_limit = {6} }
2020
- Yao Zhang , Yunpu Lu , and Thomas SeidlIn International Conference on Information Integration and Web-based Applications and ServicesIIWAS 2020
KNNAC introduces an efficient k-nearest neighbor based clustering algorithm with active core detection. The method identifies cluster cores through active learning, improving clustering efficiency and accuracy.
@inproceedings{zhang2020knnacefficientknearest, title = {KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection}, author = {Zhang, Yao and Lu, Yunpu and Seidl, Thomas}, year = {2020}, booktitle = {International Conference on Information Integration and Web-based Applications and Services}, } - Yunpu Lu , Yao Zhang , Florian Richter , and 1 more authorIn International Joint Conference on Neural NetworksIJCNN 2020 (Oral)
We propose a k-nearest neighbor based clustering method with shape alternation adaptivity. The approach adapts to varying cluster shapes and densities, improving clustering performance on diverse datasets.
@inproceedings{lu2020knearestneighborbasedclustering, title = {k-Nearest Neighbor Based Clustering with Shape Alternation Adaptivity}, author = {Lu, Yunpu and Zhang, Yao and Richter, Florian and Seidl, Thomas}, year = {2020}, booktitle = {International Joint Conference on Neural Networks}, }