Zongzhang Zhang @ NJU-AI

Zongzhang Zhang

Ph.D., Professor
LAMDA Group
School of Artificial Intelligence
National Key Laboratory for Novel Software Technology
Nanjing University, P. R. China

Office: Room A412, Big Data and Artificial Intelligence Research Building, Xianlin Campus
Email: zzzhang@nju.edu.cn

Short Bio

I am a Professor at the School of Artificial Intelligence, Nanjing University, and a member of the LAMDA group led by Prof. Zhi-Hua Zhou. I served as an Associate Professor at the School of Artificial Intelligence, Nanjing University (Jul. 2019 - Dec. 2024) and at the School of Computer Science and Technology, Soochow University (Jul. 2014 - Jun. 2019). I received my bachelor's degree in mathematics from Central South University in 2007 and my Ph.D. degree in Computer Science from University of Science and Technology of China in 2012, under the supervision of Prof. Xiaoping Chen.

My research experience includes appointments as a Visiting Scholar at the Stanford Intelligent Systems Laboratory (SISL) with Prof. Mykel J. Kochenderfer (Sept. 2018 – Mar. 2019), and as a Research Fellow at the School of Computing, National University of Singapore (Nov. 2012 – Jun. 2014), working with Prof. David Hsu and Prof. Wee Sun Lee. Earlier, I was a Visiting Student at the Rutgers Laboratory for Real-Life Reinforcement Learning (RL³) directed by Prof. Michael L. Littman (Oct. 2010 – Oct. 2011). I also briefly worked as a Research Engineer at Huawei's Noah's Ark Lab in 2012.

[Curriculum Vitae] [中文简历]

Research Interests

My research interests mainly include artificial intelligence and machine learning. Now I am working on

Reinforcement Learning (RL), including deep RL, transfer RL, data-driven RL, visual RL, safe RL, and RL for large models
Multi-agent systems, e.g., multi-agent RL, multi-agent communication, and multi-agent coordination
Probabilistic planning, particularly in partially observable Markov decision processes
Imitation Learning (IL), including IL via generative models, adversarial IL, non-adversarial IL, and multi-agent IL

Selected Publications

(* indicates corresponding author)

Reward Model Evaluation via Automatically-Ranked Policy Alignment [Paper] [Appendix] [Video/Poster/Slides]
Aoran Wang, Lei Ou, Yang Yu, and Zongzhang Zhang*
In: Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI-2026), pages 26124-26132, Singapore, 2026.
Multi-Agent In-Context Coordination via Decentralized Memory Retrieval [Paper] [Code & Appendix] [Video/Poster/Slides]
Tao Jiang, Zichuan Lin*, Lihe Li, Yi-Chen Li, Cong Guan, Lei Yuan, Zongzhang Zhang*, Yang Yu, and Deheng Ye
In: Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI-2026), pages 22363-22371, Singapore, 2026.
Generalizable Multi-modal Adversarial Imitation Learning for Non-stationary Dynamics [Paper]
Yi-Chen Li, Ningjing Chao, Zongzhang Zhang*, Fuxiang Zhang, Lei Yuan, and Yang Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(7): 5600-5612.
Learning to Coordinate with Different Teammates via Team Probing [Paper]
Hao Ding, Chengxing Jia, Zongzhang Zhang*, Cong Guan, Feng Chen, Lei Yuan, and Yang Yu
IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(9): 15807-15821.
Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models [Paper]
Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang*, Yang Yu, and Deheng Ye*
IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(11): 19681-19692.
Efficient Multi-Agent Cooperation Learning through Teammate Lookahead [Paper]
Feng Chen, Xinwei Chen, Rong-Jun Qin, Cong Guan, Lei Yuan, Zongzhang Zhang*, and Yang Yu
Transactions on Machine Learning Research, 2025, 03: 1-27.
Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning [Paper] [Code] [Project Page]
Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Chenjun Xiao, Yang Yu, and Zongzhang Zhang*
In: Proceedings of the 42nd International Conference on Machine Learning (ICML-2025), pages 18630-18657, Vancouver, Canada, 2025.
Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments [Paper] [Code]
Jiahui Wang, Chao Chen, Jiacheng Xu, Zongzhang Zhang*, and Yang Yu
In: Advances in Neural Information Processing Systems 38 (NeurIPS-2025), pages 27860-27891, San Diego, USA, 2025.
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation [Paper] [Code]
Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang*, Yang Yu, and Bo An
In: Proceedings of the 13th International Conference on Learning Representations (ICLR-2025), Singapore, 2025.
Multi-Agent Domain Calibration with a Handful of Offline Data [Paper] [Code]
Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang*, and Yang Yu
In: Advances in Neural Information Processing Systems 37 (NeurIPS-2024), pages 69607-69636, Vancouver, Canada, 2024.
Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics [Paper] [Code]
Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 41st International Conference on Machine Learning (ICML-2024), pages 59741-59758, Vienna, Austria, 2024.
Efficient and Stable Offline-to-Online Reinforcement Learning via Continual Policy Revitalization [Paper] [Appendix] [Code]
Rui Kong, Chenyang Wu, Chen-Xiao Gao, Zongzhang Zhang*, and Ming Li
In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-2024), pages 4317-4325, Jeju Island, South Korea, 2024.
Focus-Then-Decide: Segmentation-Assisted Reinforcement Learning [Paper] [Appendix] [Code] [Project Page]
Chao Chen, Jiacheng Xu, Weijian Liao, Hao Ding, Zongzhang Zhang*, Yang Yu, and Rui Zhao
In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-2024), pages 11240–11248, Vancouver, Canada, 2024.
ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning [Paper] [Appendix] [Code]
Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-2024), pages 12127–12135, Vancouver, Canada, 2024.
Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations [Paper] [Appendix] [Code]
Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-2024), pages 17132-17140, Vancouver, Canada, 2024.
Deep Anomaly Detection via Active Anomaly Search [Paper] [Appendix] [Code]
Chao Chen, Dawei Wang, Feng Mao, Jiacheng Xu, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2024), pages 308–316, Auckland, New Zealand, 2024.
Surfing Information: The Challenge of Intelligent Decision-Making [Paper]
Chenyang Wu and Zongzhang Zhang*
Intelligent Computing, 2023, 2: Article 0041.
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning [Paper] [Code]
Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 40th International Conference on Machine Learning (ICML-2023), pages 28701-28717, Honolulu, Hawaii, USA, 2023.
Discovering Generalizable Multi-Agent Coordination Skills from Multi-Task Offline Data [Paper] [Code]
Fuxiang Zhang, Chengxing Jia, Yi-Chen Li, Lei Yuan, Yang Yu, and Zongzhang Zhang*
In: Proceedings of the 11th International Conference on Learning Representations (ICLR-2023), Kigali, Rwanda, 2023.
Internal Logical Induction for Pixel-Symbolic Reinforcement Learning [Paper] [Code]
Jiacheng Xu, Chao Chen, Fuxiang Zhang, Lei Yuan, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2023), pages 2825–2837, Long Beach, CA, USA, 2023.
Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning [Paper] [Appendix]
Weijian Liao, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI-2023), pages 8746-8754, Washington, DC, USA, 2023.
Bayesian Optimistic Optimization: Optimistic Exploration for Model-Based Reinforcement Learning [Paper] [Appendix]
Chenyang Wu, Tianci Li, Zongzhang Zhang*, and Yang Yu
In: Advances in Neural Information Processing Systems 35 (NeurIPS-2022), pages 14210-14223, New Orleans, USA, 2022.
Efficient Multi-Agent Communication via Shapley Message Value [Paper] [Code] [Demo]
Di Xue, Lei Yuan, Zongzhang Zhang*, and Yang Yu
In: Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI-2022), pages 578-584, Vienna, Austria, 2022.
Multi-Agent Incentive Communication via Decentralized Teammate Modeling [Paper] [Code] [Demo]
Lei Yuan, Jianhao Wang, Fuxiang Zhang, Chenghe Wang, Zongzhang Zhang*, Yang Yu, and Chongjie Zhang*
In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI-2022), pages 9466-9474, Virtual Conference, 2022.
Adaptive Online Packing-Guided Search for POMDPs [Paper] [Appendix] [Code]
Chenyang Wu, Guoyu Yang, Zongzhang Zhang*, Yang Yu, Dong Li, Wulong Liu, and Jianye Hao
In: Advances in Neural Information Processing Systems 34 (NeurIPS-2021), pages 28419-28430, Virtual Conference, 2021.
Cross-Modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning [Paper] [Appendix] [Code]
Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang*, and Yang Yu
In: Advances in Neural Information Processing Systems 34 (NeurIPS-2021), pages 12520-12532, Virtual Conference, 2021.

[Full List of Publications] [DBLP] [Google Scholar] [Code Repositories]

Selected Patents

章宗长, 黎铭, 俞扬, 周志华, 孔锐, 吴晨阳, 高辰潇. 基于持续策略重振的离线到在线可泛化强化学习方法和装置. 中国发明专利, 2026, 专利号: ZL202410569772.0
章宗长, 俞扬, 周志华, 郭东宇. 基于最大熵分层强化学习的自动驾驶决策方法及系统. 中国发明专利, 2026, 专利号: ZL202310384039.7
章宗长, 俞扬, 周志华, 张福翔, 袁雷, 王铖鹤, 秦熔均. 基于队友感知的多智能体协作通信策略的训练系统和方法. 中国发明专利, 2025, 专利号: ZL202210297894.X
章宗长, 俞扬, 周志华, 王铖鹤, 袁雷, 张福翔, 秦熔均. 基于任务表征和队友感知的多智能体协作方法和装置. 中国发明专利, 2025, 专利号: ZL202210624473.3
章宗长, 俞扬, 周志华, 周韧哲. 基于模型不确定性与行为先验的控制策略离线训练方法. 中国发明专利, 2025, 专利号: ZL202310064893.5
章宗长, 俞扬, 孔祥瀚. 基于部分可观测强化学习的机器人导航控制方法及系统. 中国发明专利, 2025, 专利号: ZL202210366719.1
章宗长, 陈浩然, 王艺深, 沈永亮. Recognition System for Security Check and Control Method Thereof. 美国发明专利, 2023, 专利号: US11574152 B2
章宗长, 潘致远, 王辉. Large Area Surveillance Method and Surveillance Robot Based on Weighted Double Deep Q-learning. 美国发明专利, 2022, 专利号: US11224970 B2
章宗长, 俞扬, 周志华, 吴晨阳, 杨国钰. 基于自适应粒子与信念填充的部分可观察驾驶规划方法. 中国发明专利, 2022, 专利号: ZL202110410291.1
章宗长, 俞扬, 周志华, 胡亚飞, 徐峰. 基于元强化学习的车辆自适应的自动驾驶决策方法及系统. 中国发明专利, 2022, 专利号: ZL202110356309.4
章宗长, 廖沩健, 俞扬, 黎铭, 周志华. 基于粒子注意力深度Q学习的部分观测路口自主并道方法. 中国发明专利, 2022, 专利号: ZL202110337809.3
章宗长, 俞扬, 姜冲. 基于第三人称模仿学习的机械臂动作学习方法及系统. 中国发明专利, 2022, 专利号: ZL202010040178.4
章宗长, 俞扬, 周志华, 王艺深, 蒋俊鹏. 基于部分可观测迁移强化学习的自动驾驶决策方法及系统. 中国发明专利, 2021, 专利号: ZL201911373375.1

[Full List of Patents]

Ongoing Projects

国家自然科学基金面上项目 "基于知识迁移的合作型多智能体深度强化学习研究", No. 62276126, 2023.1-2026.12
深圳引望智能技术有限公司 "基于自博弈在线强化学习的自动驾驶技术", 2025.10-2026.10
华为技术有限公司 "RLHF效率及稳定性提升算法合作项目", 2025.11-2026.11
美团 "广告场景长程复杂任务的大模型多智能体协作关键技术", 2026.6-2027.6
腾讯科技（深圳）有限公司 "提升风控策略生成质量的强化学习关键技术研究", 2025.7-2026.6
百度松果计划开放课题项目 "面向长序列任务的Agentic RL优化框架研究", 2026.4-2027.4
CCF-百度松果基金项目 "提升大模型对齐效果的强化学习关键技术研究", 2025.11-2026.11

[Full List of Projects]

Professional Services

Editorial Board Member: Intelligent Computing (AAAS/Science Partner Journal, 2022-2025)
Young Associate Editor: Frontiers of Computer Science (2020-2025)
Area Chair: NeurIPS 2024-2025; IJCAI 2025; ICLR 2026; KDD 2026
Senior Program Committee Member: AAMAS 2024; IJCAI 2020-2021; AAAI 2019; ICAPS 2021; ECAI 2020, 2024, 2025
Program Committee Member/Reviewer: AAAI; ICML; IJCAI; NeurIPS; ICLR [Full List]
Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence; Artificial Intelligence; Journal of Artificial Intelligence Research [Full List]
Workshop Co-chair: Asian Workshop on Reinforcement Learning (AWRL) 2016-2018, PRICAI 2018's Workshop on Methods and Applications of Reinforcement Learning
Local Organizing Committee Chair: DAI 2020; MLA 2020, 2022
Professional Organization Membership: CCF Distinguished Member; AAAI Member; IEEE Member
Reviewer Award: ICLR 2021's Outstanding Reviewer; NeurIPS 2019, 2022's Top Reviewer
Consultant/Visiting Scholar: Polixir Technologies; Alibaba Group (2021-2022); Netease (2017-2020)

Teaching

Multi-Agent Systems (for undergraduate students, Spring 2021-2026) [textbook]
Control Theory and Methods (for undergraduate students, Fall 2020-2024, Spring 2026) [textbook]
Introduction to Artificial Intelligence (for undergraduate students, Fall 2021-2025, with Prof. Yang Yu) [textbook][course home]
Big Data, Large Model, and Decision Intelligence (for undergraduate students, Spring 2024-2025)
Reinforcement Learning (for undergraduate and graduate students, Fall 2020-2023, with Prof. Yang Yu) [textbook][course home]
Intelligent Systems: Design and Application (for undergraduate and graduate students, Spring 2020-2021) [textbook]
Intelligent Application Modeling (for undergraduate students, July 2019) [a summer course co-constructed with Tencent]

Students

Ph.D. Students:

2022 - : Weijian Liao 廖沩健 (co-supervised with Prof. Ming Li)
2023 - : Chenyang Wu 吴晨阳, Di Xue 薛迪
2024 - : Aoran Wang 王傲然 (co-supervised with Prof. Yang Yu)
2025 - : Tao Jiang 江涛 (co-supervised with Prof. Yang Yu)

Master Students:

[More Information on Current Students and Alumni]

To prospective students:

I am in a LAMDA's reinforcement learning team (LAMDA RL Lab) with Prof. Yang Yu.

I am looking for self-driven, diligent, adaptable, and resourceful students to work on exciting research in machine learning, including topics of reinforcement learning, multi-agent systems, probabilistic planning, imitation learning, etc. If you are passionate about research, you are welcome to contact me.

Mail:
National Key Laboratory for Novel Software Technology, Nanjing University, Xianlin Campus Mailbox 603, 163 Xianlin Avenue, Qixia District, Nanjing 210023, China
(In Chinese:) 南京市栖霞区仙林大道163号，南京大学仙林校区603信箱，计算机软件新技术全国重点实验室，210023。

Created on September 11, 2019