Harman Singh @Harman26Singh X Profile

Harman Singh

@Harman26Singh

Followers

1K

Following

7K

Media

28

Statuses

690

PhD student @berkeley_ai, Prev: Gemini @GoogleDeepMind, AI Resident @MetaAI. Creating intelligence.

https://t.co/2BHVikHcLT

Joined May 2019

Don't wanna be here? Send us removal request.

Harman Singh

@Harman26Singh

5 months

🚨 New @GoogleDeepMind paper 𝐑𝐨𝐛𝐮𝐬𝐭 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐯𝐢𝐚 𝐂𝐚𝐮𝐬𝐚𝐥 𝐑𝐮𝐛𝐫𝐢𝐜𝐬 📑 👉 https://t.co/oCk5jGNYlj We tackle reward hacking—when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference 🧵⬇️

AK

@_akhaliq

5 months

Robust Reward Modeling via Causal Rubrics

4

32

124

Lakshya A Agrawal

@LakshyAAAgrawal

2 days

GEPA featured in @OpenAI and @BainandCompany new cookbook tutorial, showing how to build self-evolving agents that move beyond static prompts. See how GEPA dynamically enables agents to autonomously reflect, learn from feedback, and evolve their own instructions.

21

77

559

Zhanhui Zhou

@asapzzhou

1 day

(1/n) 🚨 BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion — we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with

Andrej Karpathy

@karpathy

23 days

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've

16

90

680

Pang Wei Koh

@PangWeiKoh

1 day

Two ideas here for scaling up RL for reasoning: 1. Procedurally generating (verifiable) problems lets us adapt difficulty to the model, making training more efficient 2. Teaching the model to reason by hand (e.g., sort numbers w/o code) generalizes to realistic reasoning tasks!

Zhiyuan Zeng

@ZhiyuanZeng_

1 day

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

2

6

96

Sumanth

@sumanthd17

3 days

here's a sneak peak into my life. one full year filled a lot of moments. (Comes with its own hot takes) thanks to P and the life guard, I get to write this today! read the full article (in 🧵👇)

4

1

18

Aditya Grover

@adityagrover_

3 days

In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).

arxiv.org

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...

Hieu Pham

@hyhieu226

3 days

Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.

6

17

227

Mian Wu

@MerlinNoth79247

5 days

Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a

9

53

327

Harman Singh

@Harman26Singh

7 days

Online Rubrics for non-verifiable domains reduce reward hacking. Cool co-training of generator and critic simultaneously! In our previous work, we show rubrics help in making more robust reward models:

Rosinality

@rosinality

8 days

Online generation of rubrics. Critic generates a rubric given generator outputs and a generation is validated against the rubric. If the response satisfies the rubric the generator gets a reward and if unsatisfied then the critic gets a reward.

0

1

4

Conference on Language Modeling

@COLM_conf

8 days

COLM Keynotes: Luke Zettlemoyer Mixed-modal Language Modeling https://t.co/8FdhhrfOnG

0

19

148

Srinivas Narayanan

@snsf

8 days

IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,

openai.com

A new benchmark for evaluating AI systems on Indian culture and languages.

4

29

214

Harman Singh

@Harman26Singh

8 days

Exciting to see much-needed progress on evaluating Indic language/culture understanding! IndicGenBench shared these motivations and is one of the first generative evals for 29 Indic Languages! https://t.co/hY3tmJez6G @partha_p_t @nitish_gup

Srinivas Narayanan

@snsf

8 days

IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,

0

3

6

Gowthami

@gowthami_s

9 days

> be me > come across a paper with interesting premise. > excitedly start reading > claims are on mnist/cifar > all excitement gone, reduced to atoms

10

4

141

rishabh ranjan

@_rishabhranjan_

14 days

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally

4

38

131

Sneha Kudugunta

@snehaark

15 days

A new, tractable approach to study scaling laws for larger data mixtures compared to prior art. We achieve significantly better fit ($R^2=0.98$) on multilingual data mixtures with ~50 languages.

1

7

9

Miles Brundage

@Miles_Brundage

15 days

Seems like the whole "RL just surfaces intelligence, it doesn't increase it" series of papers is just an artifact of RL being a small fraction of compute in most LM contexts still, no? AlphaGo (etc.) shows quite clearly that there is nothing to this as a general matter

16

1

116

Rishabh Agarwal

@agarwl_

15 days

Yeah, easy to implement on-policy distillation in any existing RL framework

Joey (e/λ)

@shxf0072

16 days

wow, only if there was rl algorithms that had (self) distillation term for reverse kld. that everyone trying to remove tldr: replace pi_ref with pi_teacher you get on policy distillation

5

18

286

Gabriele Berton

@gabriberton

18 days

This is really crazy Yet another work showing that in-context learning on SOTA MLLMs (Gemini 2.5 Pro) not only does not help, but even hurts results! ICL on MLLMs is very much an open problem, and the biggest differentiator between LLMs and MLLMs [1/3]

Gabriele Berton

@gabriberton

25 days

While SOTA VLMs seem to be the solution to everything, there are 2 things they're really bad at: 1) In-context learning: easy for LLMs, impossible for VLMs 2) anomaly detection This ICCV paper highlights these two issues while benchmarking visual defect detection. [1/3]

11

35

364

Mohammed Safi Ur Rahman Khan

@SafiKhan2k

19 days

Grateful to be named a recipient of the Google PhD Fellowship 2025 under the NLP track! Thanks to @Google and my wonderful @ai4bharat family for making this journey so special.

4

3

37

Rachit Bansal

@rach_it_

21 days

Excited to share one of the first projects from my PhD! We find that Adam (often seen as approximate second-order) can actually outperform Gauss-Newton (true second-order) in certain cases! Our 2x2 comparison across basis choice and gradient noise is revealing! Thread by Sham:

Sham Kakade

@ShamKakade6

22 days

(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.

2

13

109

Andrej Karpathy

@karpathy

22 days

@thawani_avijit Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.

41

945

JosH100

@josh_wills

22 days

1/ Really looking forward to #PytorchConf this week in SF-- I've spent the last couple of months at @datologyai immersed in the DataLoader ecosystem (especially for our VLM stack) and I have a few topics I would love to discuss with folks (DMs are open, say hi if you see me, etc.

2

14

69