He He @hhexiy X Profile

He He

@hhexiy

Followers

7K

Following

304

Media

1

Statuses

143

NLP researcher. Assistant Professor at NYU CS & CDS.

https://t.co/GkJJmHPmlM

Joined December 2016

Don't wanna be here? Send us removal request.

Jack Morris

@jxmnop

3 days

very cool post quick reminder everyone doing online distillation is really reimplementing DAGGER, a paper published in 2011 that tested everything on linear SVMs this is one inspiring feature of pure research: you never really know when your ideas will start to matter

Thinking Machines

@thinkymachines

3 days

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

12

29

347

Manos Koukoumidis

@Koukoumidis

21 days

New blog post: Hours, Not Months – The Custom AI Era is Now: https://t.co/k5rD7va1W1 Oumi website:

oumi.ai

Building truly open, reliable frontier AI.

0

6

11

He He

@hhexiy

16 days

Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇

Xinpeng Wang

@XinpengWang_

23 days

‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual

2

11

130

FINTECH.TV

@FintechTvGlobal

5 hours

Collin Gage (@collingage_ ), Founder and CEO of @ARMRsciences, joins us at the @NYSE, to discuss the pressing issue of synthetic opioids, particularly fentanyl, which has escalated from a public health crisis to a national security threat in the United States. Collin explains

0

4

6

Nitish Joshi

@nitishjoshi23

23 days

Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!

Xinpeng Wang

@XinpengWang_

23 days

‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual

0

4

18

Andrej Karpathy

@karpathy

22 days

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.

296

359

7K

He He

@hhexiy

22 days

Come to Nick's poster if you're at #COLM2025 and learn about how to run LLM experiments the scientific way!

Nicholas Lourie

@NickLourie

22 days

LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9

0

4

31

PRDT | Predictions

@PRDT_Finance

9 days

Mark your calendars. After 4 years of building. $PRDT launches November 1st, 2025 - 12PM CET. Let’s make history together. 💚

282

304

1K

Nicholas Lourie

@NickLourie

22 days

LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9

2

10

30

Sasha Rush

@srush_nlp

2 months

How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.

2

20

145

Greg Durrett

@gregd_nlp

3 months

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please

94

47

765

Maksym Andriushchenko

@maksym_andr

3 months

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start

76

89

827

prb1978

@prb1978_4real

15 days

During a government shutdown, maybe the legislators should have their paychecks stopped, restrict access to donations and any campaign funds. Could motivate Congressional and Senate lawmakers to do one of their main job responsibilities in a timely fashion.

10

39

445

Kaiyu Yang

@KaiyuYang4

3 months

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

8

57

238

Jane Pan

@JanePan_

3 months

I'll be at ACL Vienna 🇦🇹 next week presenting this work! If you're around, come say hi on Monday (7/28) from 18:00–19:30 in Hall 4/5. Would love to chat about code model benchmarks 🧠, simulating user interactions 🤝, and human-centered NLP in general!

Jane Pan

@JanePan_

8 months

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_! [1/6]

1

6

53

He He

@hhexiy

4 months

tagging the correct @rico_angell !

0

7

Percy Liang

@percyliang

4 months

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

46

596

5K

He He

@hhexiy

4 months

Talking to ChatGPT isn’t like talking to a collaborator yet. It doesn’t track what you really want to do—only what you just said. Check out work led by @jcyhc_ai and @rico_angel that shows how attackers can exploit this, and a simple fix: just look at more context!

John (Yueh-Han) Chen

@jcyhc_ai

5 months

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵

2

8

28

Jiaxin Wen

@jiaxinwen22

5 months

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

37

162

1K

Squared Away

@gosquaredaway

20 days

From inbox zero to client success stories, here’s a glimpse into what our military-spouse VAs actually do all day. Spoiler: it’s not just admin work, it’s impact work.

0

2

He He

@hhexiy

5 months

Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.

Jiaxin Wen

@jiaxinwen22

5 months

Most promising-looking AI research ideas don’t pan out, but testing them burns through compute and labor. Can LMs predict idea success without running any experiments? We show that they do it better than human experts!

4

13

114

Vishakh Padmakumar

@vishakh_pk

6 months

What does it mean for #LLM output to be novel? In work w/ @jcyhc_ai, @JanePan_, @valeriechen_, @hhexiy we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

2

29

83

Jiaxin Wen

@jiaxinwen22

6 months

I'll present this paper tomorrow (10:00-12:30 am, poster at Hall 3 #300). Let's chat about reward hacking against real humans, not just proxy rewards.

Jiaxin Wen

@jiaxinwen22

1 year

RLHF is a popular method. It makes your human eval score better and Elo rating 🚀🚀. But really❓Your model might be “cheating” you! 😈😈 We show that LLMs can learn to mislead human evaluators via RLHF. 🧵below

0

10

16

Yulin Chen

@YulinChen99

7 months

We're excited to receive wide attention from the community—thank you for your support! We release code, trained probes, and the generated CoT data👇 https://t.co/Rkw6LJtAyj We have labeled answer data on its way. Stay tuned!

github.com

Contribute to AngelaZZZ-611/reasoning_models_probing development by creating an account on GitHub.

Yulin Chen

@YulinChen99

7 months

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below

1

14

45

Steven Ge

@StevenXGe

22 hours

In this 6-min video, I show how to use iDEP to interpret bulk RNA-seq data. Start with QC plots and exploratory analyses before identifying differentially regulated genes and pathways. Here, we picked up on high mitochondrial rRNA counts, one male sample mixed in with seven

0

5

36