Harman Singh
@Harman26Singh
Followers
1K
Following
7K
Media
28
Statuses
690
PhD student @berkeley_ai, Prev: Gemini @GoogleDeepMind, AI Resident @MetaAI. Creating intelligence.
Joined May 2019
๐จ New @GoogleDeepMind paper ๐๐จ๐๐ฎ๐ฌ๐ญ ๐๐๐ฐ๐๐ซ๐ ๐๐จ๐๐๐ฅ๐ข๐ง๐ ๐ฏ๐ข๐ ๐๐๐ฎ๐ฌ๐๐ฅ ๐๐ฎ๐๐ซ๐ข๐๐ฌ ๐ ๐ https://t.co/oCk5jGNYlj We tackle reward hackingโwhen RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference ๐งตโฌ๏ธ
4
32
124
GEPA featured in @OpenAI and @BainandCompany new cookbook tutorial, showing how to build self-evolving agents that move beyond static prompts. See how GEPA dynamically enables agents to autonomously reflect, learn from feedback, and evolve their own instructions.
21
77
559
(1/n) ๐จ BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion โ we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've
16
90
680
Two ideas here for scaling up RL for reasoning: 1. Procedurally generating (verifiable) problems lets us adapt difficulty to the model, making training more efficient 2. Teaching the model to reason by hand (e.g., sort numbers w/o code) generalizes to realistic reasoning tasks!
RL is bounded by finite data๐ฃ? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model ๐กfind supervision signals right at the LM capability frontier + scale them ๐in๐งต
2
6
96
here's a sneak peak into my life. one full year filled a lot of moments. (Comes with its own hot takes) thanks to P and the life guard, I get to write this today! read the full article (in ๐งต๐)
4
1
18
In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).
arxiv.org
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...
Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.
6
17
227
Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a
9
53
327
Online Rubrics for non-verifiable domains reduce reward hacking. Cool co-training of generator and critic simultaneously! In our previous work, we show rubrics help in making more robust reward models:
Online generation of rubrics. Critic generates a rubric given generator outputs and a generation is validated against the rubric. If the response satisfies the rubric the generator gets a reward and if unsatisfied then the critic gets a reward.
0
1
4
COLM Keynotes: Luke Zettlemoyer Mixed-modal Language Modeling https://t.co/8FdhhrfOnG
0
19
148
IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,
openai.com
A new benchmark for evaluating AI systems on Indian culture and languages.
4
29
214
Exciting to see much-needed progress on evaluating Indic language/culture understanding! IndicGenBench shared these motivations and is one of the first generative evals for 29 Indic Languages! https://t.co/hY3tmJez6G
@partha_p_t @nitish_gup
IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,
0
3
6
> be me > come across a paper with interesting premise. > excitedly start reading > claims are on mnist/cifar > all excitement gone, reduced to atoms
10
4
141
Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally
4
38
131
A new, tractable approach to study scaling laws for larger data mixtures compared to prior art. We achieve significantly better fit ($R^2=0.98$) on multilingual data mixtures with ~50 languages.
1
7
9
Seems like the whole "RL just surfaces intelligence, it doesn't increase it" series of papers is just an artifact of RL being a small fraction of compute in most LM contexts still, no? AlphaGo (etc.) shows quite clearly that there is nothing to this as a general matter
16
1
116
This is really crazy Yet another work showing that in-context learning on SOTA MLLMs (Gemini 2.5 Pro) not only does not help, but even hurts results! ICL on MLLMs is very much an open problem, and the biggest differentiator between LLMs and MLLMs [1/3]
While SOTA VLMs seem to be the solution to everything, there are 2 things they're really bad at: 1) In-context learning: easy for LLMs, impossible for VLMs 2) anomaly detection This ICCV paper highlights these two issues while benchmarking visual defect detection. [1/3]
11
35
364
Grateful to be named a recipient of the Google PhD Fellowship 2025 under the NLP track! Thanks to @Google and my wonderful @ai4bharat family for making this journey so special.
4
3
37
Excited to share one of the first projects from my PhD! We find that Adam (often seen as approximate second-order) can actually outperform Gauss-Newton (true second-order) in certain cases! Our 2x2 comparison across basis choice and gradient noise is revealing! Thread by Sham:
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
13
109
@thawani_avijit Haha. I am afraid people interpreted my โdelete tokenizerโ as โuse bytes directly without BPEโ, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.
41
41
945
1/ Really looking forward to #PytorchConf this week in SF-- I've spent the last couple of months at @datologyai immersed in the DataLoader ecosystem (especially for our VLM stack) and I have a few topics I would love to discuss with folks (DMs are open, say hi if you see me, etc.
2
14
69