
Pinjia He
@PinjiaHE
Followers
997
Following
811
Media
12
Statuses
246
Assistant Professor at The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen) @cuhksz.
Shenzhen, China
Joined March 2015
SWE-bench Verified is the gold standard for evaluating coding agents: 500 real-world issues + tests by OpenAI. Sounds bullet-proof? Not quite. We show passing its unit tests != matching ground truth. In our ACL paper, we fixed buggy evals: 24% of agents moved up or down the
11
36
200
Are you interested in serving on the Program Committee for @issta_conf 2026? Please let us know by filling out this form:
docs.google.com
Until 30th June 2025, please indicate your interest to serve on the ISSTA 2026 program committee through filling out this form. Please include as much information as possible. After submitting the...
1
6
9
My student Xiaoyuan Liu's @xyliu_cs collaboration work with Tencent. #ACL2025NLP
When eyes and memory clash, who wins? ๐๏ธ๐ง Introducing a comprehensive study on vision-knowledge conflicts in MLLMs, where visual input contradicts the model's internal commonsense knowledgeโand the results might surprise you. #ACL2025NLP ๐ We developed an automated framework
0
0
4
๐ I'll be launching the Formal Methods Engineering Lab ( https://t.co/9pjKYVa89h) โ and I am hiring! If youโre interested in working with me, feel free to reach out.
Super excited to share that I will be joining The University of Manchester (@OfficialUoM) as a Lecturer (Assistant Professor) in Cyber Security! The Systems and Software Security group at Manchester is already incredibly impressive, and Iโm honored to help further strengthen it.
1
11
28
Check out my student Xiaoyuan Liu's @xyliu_cs collaboration work with Tencent: RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills.
Trust your AI, but can it trust itself? ๐ค Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! ๐ง Problems tackled: โ
0
1
15
Truly humbled and honored to receive the IEEE CS TCSE Rising Star Award. Thanks a lot for the help along the way from my supervisors, referees, students, and co-authors. Will continue to focus on impactful projects about AI4SE and SE4AI. ๐ฏ https://t.co/cUX83bzcoQ
10
6
58
We invite you to nominate yourself to serve on the Program Committee for FSE'26. Please use the following link to access the nomination form: https://t.co/wCGjjsdCVv
docs.google.com
Please use this form to nominate yourself for the program committee of the ACM International Conference on the Foundations of Software Engineering (FSE 2026) by March 14, 2025. While we cannot select...
1
12
32
Agree DeepSeek is not as good as o1-pro and o3, but I think we need to look at the trends. This is what happened during the last nine months. What will happen in the next nine month if we do not change anything in the structure of the AI ecosystem in the US?
@istoica05 We are definitely not doing a good job in the USA given our resources. However, I am not entirely sure about the last claim. DeepSeek might be as good as o1 if not better, but I don't think it is as good as o1 pro or o3 (based on deep research). Additionally, one little detail
9
29
170
Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. ๐งต Link to full Report: https://t.co/k9ggxL7i66 1/16
49
527
1K
Nominate yourself for the ASE'25 PC! The 40th IEEE/ACM International Conference on Automated Software Engineering (@ASE_conf) is looking for PC nominations to maximize diversity of perspectives. ๐๏ธ https://t.co/Th6wpRfuX7 ๐งโ๐ป https://t.co/M0qPBkfwqo w/ @LingmingZhang
docs.google.com
Please indicate your interest to serve on the ASE 2025 program committee through filling out this form. Please include as much information as possible. After submitting the form you will receive a...
0
10
28
Dominik's research is solid and highly impactful! He is also very easy to get along with๐
๐๐๐งโ๐ซ I am on the academic job market! My research focuses on advancing Formal Methods, Programming Languages, and Software Engineering. Website: https://t.co/ypqj71vafu Research Statement:
0
0
1
๐๐๐งโ๐ซ I am on the academic job market! My research focuses on advancing Formal Methods, Programming Languages, and Software Engineering. Website: https://t.co/ypqj71vafu Research Statement:
3
23
67
๐ฆ At the end of Thanksgiving holidays, I finally finished the piece on reward hacking. Not an easy one to write, phew. Reward hacking occurs when an RL agent exploits flaws in the reward function or env to maximize rewards without learning the intended behavior. This is imo a
68
225
2K
FSE'25 will be buzzing with 14 co-located workshops. Congratulations to the organizers for their hard work! More details will be posted in the next few days. #FSE25 #Workshops
0
6
20