Harman26Singh Profile Banner
Harman Singh Profile
Harman Singh

@Harman26Singh

Followers
1K
Following
7K
Media
28
Statuses
690

PhD student @berkeley_ai, Prev: Gemini @GoogleDeepMind, AI Resident @MetaAI. Creating intelligence.

Joined May 2019
Don't wanna be here? Send us removal request.
@Harman26Singh
Harman Singh
5 months
๐Ÿšจ New @GoogleDeepMind paper ๐‘๐จ๐›๐ฎ๐ฌ๐ญ ๐‘๐ž๐ฐ๐š๐ซ๐ ๐Œ๐จ๐๐ž๐ฅ๐ข๐ง๐  ๐ฏ๐ข๐š ๐‚๐š๐ฎ๐ฌ๐š๐ฅ ๐‘๐ฎ๐›๐ซ๐ข๐œ๐ฌ ๐Ÿ“‘ ๐Ÿ‘‰ https://t.co/oCk5jGNYlj We tackle reward hackingโ€”when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference ๐Ÿงตโฌ‡๏ธ
@_akhaliq
AK
5 months
Robust Reward Modeling via Causal Rubrics
4
32
124
@LakshyAAAgrawal
Lakshya A Agrawal
2 days
GEPA featured in @OpenAI and @BainandCompany new cookbook tutorial, showing how to build self-evolving agents that move beyond static prompts. See how GEPA dynamically enables agents to autonomously reflect, learn from feedback, and evolve their own instructions.
21
77
559
@asapzzhou
Zhanhui Zhou
1 day
(1/n) ๐Ÿšจ BERTs that chat: turn any BERT into a chatbot with diffusion hi @karpathy, we just trained a few BERTs to chat with diffusion โ€” we are releasing all the model checkpoints, training curves, and recipes! Hopefully this spares you the side quest into training nanochat with
@karpathy
Andrej Karpathy
23 days
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've
16
90
680
@PangWeiKoh
Pang Wei Koh
1 day
Two ideas here for scaling up RL for reasoning: 1. Procedurally generating (verifiable) problems lets us adapt difficulty to the model, making training more efficient 2. Teaching the model to reason by hand (e.g., sort numbers w/o code) generalizes to realistic reasoning tasks!
@ZhiyuanZeng_
Zhiyuan Zeng
1 day
RL is bounded by finite data๐Ÿ˜ฃ? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model ๐Ÿ’กfind supervision signals right at the LM capability frontier + scale them ๐Ÿ”—in๐Ÿงต
2
6
96
@sumanthd17
Sumanth
3 days
here's a sneak peak into my life. one full year filled a lot of moments. (Comes with its own hot takes) thanks to P and the life guard, I get to write this today! read the full article (in ๐Ÿงต๐Ÿ‘‡)
4
1
18
@adityagrover_
Aditya Grover
3 days
In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).
Tweet card summary image
arxiv.org
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...
@hyhieu226
Hieu Pham
3 days
Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.
6
17
227
@MerlinNoth79247
Mian Wu
5 days
Can we run RL to train LLMs on hard-to-verify or open-ended tasks? Even when tasks are verifiable, it is often impossible to check every design detail or catch all mistakes.. We can go prompt-tune LLM judges, but is that really the answer? Our new paper introduces RLAC: a
9
53
327
@Harman26Singh
Harman Singh
7 days
Online Rubrics for non-verifiable domains reduce reward hacking. Cool co-training of generator and critic simultaneously! In our previous work, we show rubrics help in making more robust reward models:
@rosinality
Rosinality
8 days
Online generation of rubrics. Critic generates a rubric given generator outputs and a generation is validated against the rubric. If the response satisfies the rubric the generator gets a reward and if unsatisfied then the critic gets a reward.
0
1
4
@COLM_conf
Conference on Language Modeling
8 days
COLM Keynotes: Luke Zettlemoyer Mixed-modal Language Modeling https://t.co/8FdhhrfOnG
0
19
148
@snsf
Srinivas Narayanan
8 days
IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,
Tweet card summary image
openai.com
A new benchmark for evaluating AI systems on Indian culture and languages.
4
29
214
@Harman26Singh
Harman Singh
8 days
Exciting to see much-needed progress on evaluating Indic language/culture understanding! IndicGenBench shared these motivations and is one of the first generative evals for 29 Indic Languages! https://t.co/hY3tmJez6G @partha_p_t @nitish_gup
@snsf
Srinivas Narayanan
8 days
IndQA is a new benchmark designed to evaluate how well AI systems understand culture, context and history to answer questions that matter to people in India. With 2278 questions created in partnership with 250+ experts, IndQA dives deep into reasoning about everyday life,
0
3
6
@gowthami_s
Gowthami
9 days
> be me > come across a paper with interesting premise. > excitedly start reading > claims are on mnist/cifar > all excitement gone, reduced to atoms
10
4
141
@_rishabhranjan_
rishabh ranjan
14 days
Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally
4
38
131
@snehaark
Sneha Kudugunta
15 days
A new, tractable approach to study scaling laws for larger data mixtures compared to prior art. We achieve significantly better fit ($R^2=0.98$) on multilingual data mixtures with ~50 languages.
1
7
9
@Miles_Brundage
Miles Brundage
15 days
Seems like the whole "RL just surfaces intelligence, it doesn't increase it" series of papers is just an artifact of RL being a small fraction of compute in most LM contexts still, no? AlphaGo (etc.) shows quite clearly that there is nothing to this as a general matter
16
1
116
@agarwl_
Rishabh Agarwal
15 days
Yeah, easy to implement on-policy distillation in any existing RL framework
@shxf0072
Joey (e/ฮป)
16 days
wow, only if there was rl algorithms that had (self) distillation term for reverse kld. that everyone trying to remove tldr: replace pi_ref with pi_teacher you get on policy distillation
5
18
286
@gabriberton
Gabriele Berton
18 days
This is really crazy Yet another work showing that in-context learning on SOTA MLLMs (Gemini 2.5 Pro) not only does not help, but even hurts results! ICL on MLLMs is very much an open problem, and the biggest differentiator between LLMs and MLLMs [1/3]
@gabriberton
Gabriele Berton
25 days
While SOTA VLMs seem to be the solution to everything, there are 2 things they're really bad at: 1) In-context learning: easy for LLMs, impossible for VLMs 2) anomaly detection This ICCV paper highlights these two issues while benchmarking visual defect detection. [1/3]
11
35
364
@SafiKhan2k
Mohammed Safi Ur Rahman Khan
19 days
Grateful to be named a recipient of the Google PhD Fellowship 2025 under the NLP track! Thanks to @Google and my wonderful @ai4bharat family for making this journey so special.
4
3
37
@rach_it_
Rachit Bansal
21 days
Excited to share one of the first projects from my PhD! We find that Adam (often seen as approximate second-order) can actually outperform Gauss-Newton (true second-order) in certain cases! Our 2x2 comparison across basis choice and gradient noise is revealing! Thread by Sham:
@ShamKakade6
Sham Kakade
22 days
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
13
109
@karpathy
Andrej Karpathy
22 days
@thawani_avijit Haha. I am afraid people interpreted my โ€œdelete tokenizerโ€ as โ€œuse bytes directly without BPEโ€, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.
41
41
945
@josh_wills
JosH100
22 days
1/ Really looking forward to #PytorchConf this week in SF-- I've spent the last couple of months at @datologyai immersed in the DataLoader ecosystem (especially for our VLM stack) and I have a few topics I would love to discuss with folks (DMs are open, say hi if you see me, etc.
2
14
69