Publications
2026
- Yao Zhang , Shijie Tang , Zeyu Li , and 2 more authorsICLR 2026
Web agents hold great potential for automating complex computer tasks, yet their interactions involve long-horizon, sequential decision-making with irreversible actions. In such settings, outcome-based supervision is sparse and delayed, often rewarding incorrect trajectories and failing to support inference-time scaling. This motivates the use of Process Reward Models (WebPRMs) for web navigation, but existing approaches remain limited: scalar WebPRMs collapse progress into coarse, weakly grounded signals, while checklist-based WebPRMs rely on brittle template matching that fails under layout or semantic changes and often mislabels superficially correct actions as successful, providing little insight or interpretabil- ity. To address these challenges, we introduce WebArbiter, a reasoning-first, principle-inducing WebPRM that formulates reward modeling as text generation, producing structured justifications that conclude with a preference verdict and identify the action most conducive to task completion under the current context. Training follows a two-stage pipeline: reasoning distillation equips the model with coherent principle-guided reasoning, and reinforcement learning corrects teacher biases by directly aligning verdicts with correctness, enabling stronger generalization. To support systematic evaluation, we release WEBPRMBENCH, a comprehensive benchmark spanning four diverse web environments with rich tasks and high-quality preference annotations. On WEBPRMBENCH, WebArbiter-7B outperforms the strongest baseline, GPT-5, by 9.1 points. In reward-guided trajec- tory search on WebArena-Lite, it surpasses the best prior WebPRM by up to 7.2 points, underscoring its robustness and practical value in real-world complex web tasks.
@misc{zhang2026webarbiterprincipleguidedreasoningprocess, title = {WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents}, author = {Zhang, Yao and Tang, Shijie and Li, Zeyu and Han, Zhen and Tresp, Volker}, year = {2026}, eprint = {2601.21872}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2601.21872}, topic = {reasoning}, } - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language ModelsHaokun Chen , Jianing Li , Yao Zhang , and 4 more authorsAAAI 2026
AUVIC introduces an adversarial unlearning framework for removing specific visual concepts from multi-modal large language models. The approach enables selective concept removal while maintaining model performance on other tasks.
@misc{chen2026auvicadversarialunlearningvisual, title = {AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models}, author = {Chen, Haokun and Li, Jianing and Zhang, Yao and Bi, Jinhe and Xia, Yan and Gu, Jindong and Tresp, Volker}, year = {2026}, }
2025
- Yao Zhang , Chenyang Lin , Shijie Tang , and 4 more authorsEMNLP 2025 (Main)
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated multi-agent generation.
@misc{zhang2025swarmagenticfullyautomatedagentic, title = {SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence}, author = {Zhang, Yao and Lin, Chenyang and Tang, Shijie and Chen, Haokun and Zhou, Shijie and Ma, Yunpu and Tresp, Volker}, year = {2025}, eprint = {2506.15672}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.15672}, topic = {system}, } - Yao Zhang , Yu Wu , Haowei Zhang , and 6 more authorsNeurIPS 2025 Workshop LAW
Process Reward Models (PRMs) aim to improve multi-step reasoning in Large Language Models (LLMs) by supervising intermediate steps and identifying errors throughout the reasoning process. However, building effective PRMs remains challenging due to the lack of scalable, high-quality annotations. Existing approaches rely on costly human labeling, LLM-based self-evaluation that is prone to hallucination, or Monte Carlo (MC) estimation, which infers step quality solely from rollout outcomes and often introduces noisy, misaligned supervision due to credit misattribution. These issues result in three core limitations: noisy rewards, low factual fidelity, and misalignment with step-level reasoning objectives. To address these challenges, we introduce GroundedPRM, a tree-guided and fidelity-aware framework for automatic process supervision. To reduce reward noise and enable fine-grained credit assignment, we construct structured reasoning paths via Monte Carlo Tree Search (MCTS). To eliminate hallucinated supervision, we validate each intermediate step using an external tool, providing precise, execution-grounded correctness signals. To combine both step-level validation and global outcome assessment, we design a hybrid reward aggregation mechanism that fuses tool-based verification with MCTS-derived feedback. Finally, we format the reward signal into a rationale-enhanced, generative structure to promote interpretability and compatibility with instruction-tuned LLMs. GroundedPRM is trained on only 40K automatically labeled samples, amounting to just 10% of the data used by the best-performing PRM trained with auto-labeled supervision. Nevertheless, it achieves up to a 26% relative improvement in average performance on ProcessBench. When used for reward-guided greedy search, GroundedPRM outperforms even PRMs trained with human-labeled supervision, offering a scalable and verifiable path toward high-quality process-level reasoning.
@misc{zhang2025groundedprmtreeguidedfidelityawareprocess, title = {GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning}, author = {Zhang, Yao and Wu, Yu and Zhang, Haowei and Li, Weiguo and Chen, Haokun and Wu, Jingpei and Li, Guohao and Han, Zhen and Tresp, Volker}, year = {2025}, eprint = {2510.14942}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2510.14942}, topic = {reasoning}, } - Yao Zhang , Zijian Ma , Yunpu Ma , and 3 more authorsAAAI 2025
LLM-based autonomous agents often fail in executing complex web tasks that require dynamic interaction, largely due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies and actions based on new observations, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with the vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks, continuously refining this plan through reflective analysis of new observations and previous subtask attempts, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information by iteratively refining decisions based on new observations. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.
@misc{zhang2024webpilotversatileautonomousmultiagent, title = {WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration}, author = {Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker}, year = {2025}, eprint = {2408.15978}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2408.15978}, topic = {system}, } - Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual BackpropagationXiaowen Ma , Chenyang Lin , Yao Zhang , and 2 more authorsUnder Review, 2025
We propose Agentic Neural Networks, a framework that enables neural networks to evolve their own architectures through textual backpropagation. By reformulating network updates as text-based operations, we enable self-evolving multi-agent systems that can adaptively modify their structure and coordination mechanisms.
@misc{ma2025agenticneuralnetworksselfevolving, title = {Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation}, author = {Ma, Xiaowen and Lin, Chenyang and Zhang, Yao and Tresp, Volker and Ma, Yunpu}, year = {2025}, eprint = {2506.09046}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.09046}, } - CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question AnsweringYao Zhang , Haokun Chen , Ahmed Frikha , and 5 more authorsWACV 2025
We introduce CL-CrossVQA, a benchmark for continual learning in cross-domain visual question answering. The benchmark evaluates the ability of vision-language models to retain knowledge while adapting to new domains, highlighting challenges in representation retention and cross-domain generalization.
@misc{zhang2025clcrossvqacontinuallearningbenchmark, title = {CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering}, author = {Zhang, Yao and Chen, Haokun and Frikha, Ahmed and Yang, Yezi and Krompass, Denis and Zhang, Gengyuan and Gu, Jindong and Tresp, Volker}, year = {2025}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} } - FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language ModelsYao Zhang , Hewei Gao , Haokun Chen , and 3 more authorsUnder Review, 2025
FedNano proposes a lightweight federated tuning framework for pretrained multimodal large language models. The method drastically reduces client-side computational cost while maintaining strong reasoning and adaptation performance, enabling scalable and privacy-preserving multimodal intelligence.
@misc{zhang2025fednanotowardlightweightfederated, title = {FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models}, author = {Zhang, Yao and Gao, Hewei and Chen, Haokun and Li, Weiguo and Ma, Yunpu and Tresp, Volker}, year = {2025}, eprint = {2506.14824}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2506.14824}, topic = {learning} } - Haokun Chen , Hang Li , Yao Zhang , and 7 more authorsCVPR 2025
FedBiP introduces a heterogeneous one-shot federated learning framework that leverages personalized latent diffusion models. The approach enables efficient model aggregation in federated settings with heterogeneous data distributions while preserving client-specific personalization.
@misc{chen2025fedbipheterogeneousoneshotfederated, title = {FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models}, author = {Chen, Haokun and Li, Hang and Zhang, Yao and Bi, Jinhe and Zhang, Gengyuan and Zhang, Yueqi and Torr, Philip and Gu, Jindong and Krompass, Denis and Tresp, Volker}, year = {2025}, } - Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMsGengyuan Zhang , Mingcong Ding , Tong Liu , Yao Zhang , and 1 more authorICLR 2025 Workshop World Models
This work investigates how multimodal large language models handle streaming events in videos. We find that while memory mechanisms help with temporal understanding, confabulation can mislead model predictions, highlighting the need for improved temporal reasoning in video-language models.
@misc{zhang2025memoryhelpsconfabulationmisleads, title = {Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs}, author = {Zhang, Gengyuan and Ding, Mingcong and Liu, Tong and Zhang, Yao and Tresp, Volker}, year = {2025}, eprint = {2502.15457}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2502.15457}, author_limit = {4} } - Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMsHaokun Chen , Yueqi Zhang , Yuan Bi , Yao Zhang , and 8 more authorsNeurIPS 2025 Workshop LockLLM
We propose a framework for auditing machine unlearning in large language models. The framework evaluates whether unlearning truly removes target knowledge while maintaining model utility, providing tools to assess the effectiveness of unlearning methods.
@misc{chen2025doesmachineunlearningtrulyremove, title = {Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs}, author = {Chen, Haokun and Zhang, Yueqi and Bi, Yuan and Zhang, Yao and Liu, Tong and Bi, Jinhe and Lan, Jian and Gu, Jindong and Grosser, Claudia and Krompass, Denis and Navab, Nassir and Tresp, Volker}, year = {2025}, author_limit = {4}, bold_self = {true}, eprint = {2505.23270}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {https://arxiv.org/abs/2505.23270}, }
2024
- FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated LearningHaokun Chen , Yao Zhang , Denis Krompass , and 2 more authorsAAAI 2024
FedDAT presents an approach for foundation model finetuning in multi-modal heterogeneous federated learning. The method addresses challenges in federated adaptation of large foundation models across diverse data modalities and client distributions.
@misc{chen2024feddatapproachfoundationmodel, title = {FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning}, author = {Chen, Haokun and Zhang, Yao and Krompass, Denis and Gu, Jindong and Tresp, Volker}, year = {2024}, eprint = {}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, topic = {learning} }
2023
- Yao Zhang , Yunpu Ma , Thomas Seidl , and 1 more authorIJCNN 2023 (Oral)
We propose an adaptive multi-resolution attention mechanism that achieves linear complexity. The method dynamically adjusts resolution levels based on input characteristics, enabling efficient processing of long sequences while maintaining attention quality.
@inproceedings{zhang2023adaptivemultiresolutionattention, title = {Adaptive Multi-Resolution Attention with Linear Complexity}, author = {Zhang, Yao and Ma, Yunpu and Seidl, Thomas and Tresp, Volker}, year = {2023}, }
2022
- Zhen Han , Ruotong Liao , Jindong Gu , Yao Zhang , and 5 more authorsACL 2022
ECOLA enhances temporal knowledge embeddings by incorporating contextualized language representations. The approach improves temporal reasoning in knowledge graphs by leveraging pre-trained language models to capture temporal context.
@inproceedings{han2022ecolaenhancedtemporalknowledge, title = {ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations}, author = {Han, Zhen and Liao, Ruotong and Gu, Jindong and Zhang, Yao and Ding, Zifeng and Gu, Yujia and Koeppl, Heinz and Schütze, Hinrich and Tresp, Volker}, year = {2022}, author_limit = {4} }
2021
- Michael Fromm , Evgeniy Faerman , Max Berrendorf , Siddharth Bhargava , Ruoxia Qi , Yao Zhang , and 4 more authorsAAAI 2021
We present an argument mining approach for analyzing peer-review texts. The method extracts and structures arguments from review comments, enabling automated analysis of review quality and feedback patterns.
@inproceedings{fromm2021argumentminingdrivenanalysis, title = {Argument Mining Driven Analysis of Peer-Reviews}, author = {Fromm, Michael and Faerman, Evgeniy and Berrendorf, Max and Bhargava, Siddharth and Qi, Ruoxia and Zhang, Yao and Dennert, Lukas and Selle, Sophia and Mao, Yang and Seidl, Thomas}, year = {2021}, author_limit = {6} }
2020
- Yao Zhang , Yifeng Lu , and Thomas SeidlIIWAS 2020
KNNAC introduces an efficient k-nearest neighbor based clustering algorithm with active core detection. The method identifies cluster cores through active learning, improving clustering efficiency and accuracy.
@inproceedings{zhang2020knnacefficientknearest, title = {KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection}, author = {Zhang, Yao and Lu, Yifeng and Seidl, Thomas}, year = {2020}, } - Yifeng Lu , Yao Zhang , Florian Richter , and 1 more authorIJCNN 2020 (Oral)
We propose a k-nearest neighbor based clustering method with shape alternation adaptivity. The approach adapts to varying cluster shapes and densities, improving clustering performance on diverse datasets.
@inproceedings{lu2020knearestneighborbasedclustering, title = {k-Nearest Neighbor Based Clustering with Shape Alternation Adaptivity}, author = {Lu, Yifeng and Zhang, Yao and Richter, Florian and Seidl, Thomas}, year = {2020}, }