He He Profile
He He

@hhexiy

Followers
7K
Following
304
Media
1
Statuses
143

NLP researcher. Assistant Professor at NYU CS & CDS.

Joined December 2016
Don't wanna be here? Send us removal request.
@jxmnop
Jack Morris
3 days
very cool post quick reminder everyone doing online distillation is really reimplementing DAGGER, a paper published in 2011 that tested everything on linear SVMs this is one inspiring feature of pure research: you never really know when your ideas will start to matter
@thinkymachines
Thinking Machines
3 days
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
12
29
347
@Koukoumidis
Manos Koukoumidis
21 days
New blog post: Hours, Not Months – The Custom AI Era is Now: https://t.co/k5rD7va1W1 Oumi website:
oumi.ai
Building truly open, reliable frontier AI.
0
6
11
@hhexiy
He He
16 days
Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇
@XinpengWang_
Xinpeng Wang
23 days
‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual
2
11
130
@FintechTvGlobal
FINTECH.TV
5 hours
Collin Gage (@collingage_ ), Founder and CEO of @ARMRsciences, joins us at the @NYSE, to discuss the pressing issue of synthetic opioids, particularly fentanyl, which has escalated from a public health crisis to a national security threat in the United States. Collin explains
0
4
6
@nitishjoshi23
Nitish Joshi
23 days
Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!
@XinpengWang_
Xinpeng Wang
23 days
‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual
0
4
18
@karpathy
Andrej Karpathy
22 days
I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.
296
359
7K
@hhexiy
He He
22 days
Come to Nick's poster if you're at #COLM2025 and learn about how to run LLM experiments the scientific way!
@NickLourie
Nicholas Lourie
22 days
LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9
0
4
31
@PRDT_Finance
PRDT | Predictions
9 days
Mark your calendars. After 4 years of building. $PRDT launches November 1st, 2025 - 12PM CET. Let’s make history together. 💚
282
304
1K
@NickLourie
Nicholas Lourie
22 days
LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9
2
10
30
@srush_nlp
Sasha Rush
2 months
How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.
2
20
145
@gregd_nlp
Greg Durrett
3 months
📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please
94
47
765
@maksym_andr
Maksym Andriushchenko
3 months
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start
76
89
827
@prb1978_4real
prb1978
15 days
During a government shutdown, maybe the legislators should have their paychecks stopped, restrict access to donations and any campaign funds. Could motivate Congressional and Senate lawmakers to do one of their main job responsibilities in a timely fashion.
10
39
445
@KaiyuYang4
Kaiyu Yang
3 months
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
8
57
238
@JanePan_
Jane Pan
3 months
I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!
@JanePan_
Jane Pan
8 months
When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]
1
6
53
@hhexiy
He He
4 months
tagging the correct @rico_angell !
0
0
7
@percyliang
Percy Liang
4 months
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:
46
596
5K
@hhexiy
He He
4 months
Talking to ChatGPT isn’t like talking to a collaborator yet. It doesn’t track what you really want to do—only what you just said. Check out work led by @jcyhc_ai and @rico_angel that shows how attackers can exploit this, and a simple fix: just look at more context!
@jcyhc_ai
John (Yueh-Han) Chen
5 months
LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵
2
8
28
@jiaxinwen22
Jiaxin Wen
5 months
New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.
37
162
1K
@gosquaredaway
Squared Away
20 days
From inbox zero to client success stories, here’s a glimpse into what our military-spouse VAs actually do all day. Spoiler: it’s not just admin work, it’s impact work.
0
0
2
@hhexiy
He He
5 months
Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.
@jiaxinwen22
Jiaxin Wen
5 months
Most promising-looking AI research ideas don’t pan out, but testing them burns through compute and labor. Can LMs predict idea success without running any experiments? We show that they do it better than human experts!
4
13
114
@vishakh_pk
Vishakh Padmakumar
6 months
What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
2
29
83
@jiaxinwen22
Jiaxin Wen
6 months
I'll present this paper tomorrow (10:00-12:30 am, poster at Hall 3 #300). Let's chat about reward hacking against real humans, not just proxy rewards.
@jiaxinwen22
Jiaxin Wen
1 year
RLHF is a popular method. It makes your human eval score better and Elo rating 🚀🚀. But really❓Your model might be “cheating” you! 😈😈 We show that LLMs can learn to mislead human evaluators via RLHF. 🧵below
0
10
16
@YulinChen99
Yulin Chen
7 months
We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 https://t.co/Rkw6LJtAyj We have labeled answer data on its way. Stay tuned!
Tweet card summary image
github.com
Contribute to AngelaZZZ-611/reasoning_models_probing development by creating an account on GitHub.
@YulinChen99
Yulin Chen
7 months
Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below
1
14
45
@StevenXGe
Steven Ge
22 hours
In this 6-min video, I show how to use iDEP to interpret bulk RNA-seq data. Start with QC plots and exploratory analyses before identifying differentially regulated genes and pathways. Here, we picked up on high mitochondrial rRNA counts, one male sample mixed in with seven
0
5
36