Hi there! My name is Chunyang Li. I’m a second-year MPhil student in the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology, advised by Prof. Yangqiu Song. I received my Bachelor of Engineering degree from the Department of Computer Science and Technology at Tsinghua University in June 2024. During my undergraduate years, I was fortunate to work under the guidance of Prof. Juanzi Li and Prof. Lei Hou.

My research interests primarily lie in the field of natural language processing, with a particular focus on language model evaluation, including:

  • cognitive capabilities of language models: understanding their fundamental cognitive capabilities, such as knowledge acquisition, adaptive evolution and reasoning.
  • LLM-as-a-judge: investigating the use of large language models or language-based agents for automated and nuanced assessment.

Thanks for dropping by! Have a nice day!

🔥 News

  • 2025-10: We released a meta-evaluation benchmark “WebDevJudge” for evaluating the judge capabilities of LLMs on web development tasks. Check it out!
  • 2025-05: Our paper “Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations” has been accepted to the ACL 2025 Findings!
  • 2024.09: 🎉 Our paper “MAVEN-Fact: A Large-scale Event Factuality Detection Dataset” has been accepted to the EMNLP 2024 Findings!
  • 2024.06: Graduated from Tsinghua University with a Bachelor of Engineering degree in Computer Science and Technology!

📝 Publications

Selected Publications

Arxiv
sym

WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality

Chunyang Li*, Yilun Zheng*, Xinting Huang, Tianqing Fang, Jiahao Xu, Yangqiu Song, Lihui Chen, Han Hu

Code

  • WebDevJudge is a systematic benchmark for assessing LLM-as-a-judge performance in web development.
  • It supports both non-interactive evaluation based on static observations and interactive evaluation with a dynamic web environment.
  • We also provide WebDevJudge-Unit, a diagnostic dataset specifically designed to evaluate task-level feasibility verification capabilities.
Findings of ACL
sym

Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations

Chunyang Li, Weiqi Wang, Tianshi Zheng, Yangqiu Song

Code

In Findings of the Association for Computational Linguistics: ACL 2025.

Findings of EMNLP
sym

MAVEN-FACT: A Large-scale Event Factuality Detection Dataset

Chunyang Li*, Hao Peng*, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li

Code

In Findings of the Association for Computational Linguistics: EMNLP 2024.

Arxiv
sym

Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study

Baixuan Xu*, Chunyang Li*, Weiqi Wang*, Wei Fan, Tianshi Zheng, Haochen Shi, Tao Fan, Yangqiu Song, Qiang Yang

Arxiv
sym

ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time

Shangqing Tu*, Chunyang Li*, Jifan Yu, Xiaozhi Wang, Lei Hou, Juanzi Li

Code

* indicates equal contributions.

Full Publications

You can also find my latest publications on Google Scholar.

Journals & Conference Proceedings

Arxiv Preprints

* indicates equal contributions.

🎖 Honors and Awards

  • 2024.06 Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University
  • 2023.09 Academic Excellence Scholarship, Tsinghua University
  • 2022.09 Academic Excellence Scholarship, Tsinghua University
  • 2022.09 Social Practice Scholarship, Tsinghua University
  • 2020.09 Freshman Scholarship, Tsinghua University

📖 Educations

  • 2024.09 - 2026.06 (expected), MPhil in Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong.
  • 2020.09 - 2024.06, B.Eng in Computer Science and Technology, Tsinghua University, Beijing.

💻 Internships

  • 2025.06 - Present, Tencent AI Lab, Shenzhen.
  • 2023.06 - 2023.07, Zhipu AI, Beijing.
  • 2022.06 - 2023.06, THUKEG, Beijing.

📄 CV

You can find my CV here (Updated in 2025.10).