Hi there! My name is Chunyang Li. I’m a second-year MPhil student in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology, advised by Prof. Yangqiu Song. I received my Bachelor of Engineering degree from the Department of Computer Science and Technology at Tsinghua University in June 2024. During my undergraduate years, I was fortunate to work under the guidance of Prof. Juanzi Li and Prof. Lei Hou.
My research interests primarily lie in the field of natural language processing, with a particular focus on language model evaluation, including:
- cognitive capabilities of language models: understanding their fundamental cognitive capabilities, such as knowledge acquisition, adaptive evolution and reasoning.
- LLM-as-a-judge: investigating the use of large language models or language-based agents for automated and nuanced assessment.
Thanks for dropping by! Have a nice day!
🔥 News
- 2025-10: We released a meta-evaluation benchmark “WebDevJudge” for evaluating the judge capabilities of LLMs on web development tasks. Check it out!
- 2025-05: Our paper “Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations” has been accepted to the ACL 2025 Findings!
- 2024.09: 🎉 Our paper “MAVEN-Fact: A Large-scale Event Factuality Detection Dataset” has been accepted to the EMNLP 2024 Findings!
- 2024.06: Graduated from Tsinghua University with a Bachelor of Engineering degree in Computer Science and Technology!
📝 Publications
Selected Publications

WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
Chunyang Li*, Yilun Zheng*, Xinting Huang, Tianqing Fang, Jiahao Xu, Yangqiu Song, Lihui Chen, Han Hu
- WebDevJudge is a systematic benchmark for assessing LLM-as-a-judge performance in web development.
- It supports both non-interactive evaluation based on static observations and interactive evaluation with a dynamic web environment.
- We also provide WebDevJudge-Unit, a diagnostic dataset specifically designed to evaluate task-level feasibility verification capabilities.

Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song
In Findings of the Association for Computational Linguistics: ACL 2025.

MAVEN-FACT: A Large-scale Event Factuality Detection Dataset
Chunyang Li*, Hao Peng*, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li
In Findings of the Association for Computational Linguistics: EMNLP 2024.

Baixuan Xu*, Chunyang Li*, Weiqi Wang*, Wei Fan, Tianshi Zheng, Haochen Shi, Tao Fan, Yangqiu Song, Qiang Yang

ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
Shangqing Tu*, Chunyang Li*, Jifan Yu, Xiaozhi Wang, Lei Hou, Juanzi Li
* indicates equal contributions.
Full Publications
You can also find my latest publications on Google Scholar.
Journals & Conference Proceedings
Findings of ACL 2025Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations, Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song. In Findings of the Association for Computational Linguistics: ACL 2025. 2025.Findings of EMNLP 2024MAVEN-FACT: A Large-scale Event Factuality Detection Dataset, Chunyang Li*, Hao Peng*, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li. In Findings of the Association for Computational Linguistics: EMNLP 2024. 2024.TMLRThe Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning, Tianshi Zheng*, Yixiang Chen*, Chengxi Li*, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Y. Wong, Simon See. Transactions on Machine Learning Research (TMLR). 2025.EMNLP 2025LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning, Tianshi Zheng, Jiayang Cheng, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Y. Wong, Simon See. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.ACL 2024CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning, Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Cheng Jiayang, Chunkit Chan, Yangqiu Song. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024.ICLR 2024KoLA: Carefully Benchmarking World Knowledge of Large Language Models, Jifan Yu*, Xiaozhi Wang*, Shangqing Tu*, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li. The Twelfth International Conference on Learning Representations. 2024.CIKM 2023LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts, Shangqing Tu*, Zheyuan Zhang*, Jifan Yu, Chunyang Li, Siyu Zhang, Zijun Yao, Lei Hou, Juanzi Li. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2023.
Arxiv Preprints
ArxivWebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality, Chunyang Li*, Yilun Zheng*, Xinting Huang, Tianqing Fang, Jiahao Xu, Yangqiu Song, Lihui Chen, Han Hu. Arxiv preprint. 2025.ArxivAutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora, Jiaxin Bai*, Wei Fan*, Qi Hu*, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Yauwai Yim, Haoyu Huang, Xiao Zhou, Feng Qin, Tianshi Zheng, Xi Peng, Xin Yao, Huiwen Yang, Leijie Wu, Yi Ji, Gong Zhang, Renhai Chen, Yangqiu Song. Arxiv preprint. 2025.ArxivINFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling, Haochen Shi*, Tianshi Zheng*, Weiqi Wang*, Baixuan Xu, Chunyang Li, Chunkit Chan, Tao Fan, Yangqiu Song, Qiang Yang. Arxiv preprint. 2025.ArxivLegal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents, Wei Fan, Tianshi Zheng, Yiran Hu, Zheye Deng, Weiqi Wang, Baixuan Xu, Chunyang Li, Haoran Li, Weixing Shen, Yangqiu Song. Arxiv preprint. 2025.ArxivTowards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study, Baixuan Xu*, Chunyang Li*, Weiqi Wang*, Wei Fan, Tianshi Zheng, Haochen Shi, Tao Fan, Yangqiu Song, Qiang Yang. Arxiv preprint. 2025.ArxivEvent-level Knowledge Editing, Hao Peng*, Xiaozhi Wang*, Chunyang Li, Kaisheng Zeng, Jiangshan Duo, Yixin Cao, Lei Hou, Juanzi Li. Arxiv preprint. 2024.ArxivChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time, Shangqing Tu*, Chunyang Li*, Jifan Yu, Xiaozhi Wang, Lei Hou, Juanzi Li. Arxiv preprint. 2023.
* indicates equal contributions.
🎖 Honors and Awards
- 2024.06 Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University
- 2023.09 Academic Excellence Scholarship, Tsinghua University
- 2022.09 Academic Excellence Scholarship, Tsinghua University
- 2022.09 Social Practice Scholarship, Tsinghua University
- 2020.09 Freshman Scholarship, Tsinghua University
📖 Educations
- 2024.09 - 2026.06 (expected), MPhil in Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong.
- 2020.09 - 2024.06, B.Eng in Computer Science and Technology, Tsinghua University, Beijing.
💻 Internships
- 2025.06 - Present, Tencent AI Lab, Shenzhen.
- 2023.06 - 2023.07, Zhipu AI, Beijing.
- 2022.06 - 2023.06, THUKEG, Beijing.
📄 CV
You can find my CV here (Updated in 2025.10).