Harman Singh
@Harman26Singh
Followers
1K
Following
7K
Media
27
Statuses
672
PhD student @berkeley_ai, Prev: Gemini @GoogleDeepMind, AI Resident @MetaAI. Creating intelligence.
Joined May 2019
๐จ New @GoogleDeepMind paper ๐๐จ๐๐ฎ๐ฌ๐ญ ๐๐๐ฐ๐๐ซ๐ ๐๐จ๐๐๐ฅ๐ข๐ง๐ ๐ฏ๐ข๐ ๐๐๐ฎ๐ฌ๐๐ฅ ๐๐ฎ๐๐ซ๐ข๐๐ฌ ๐ ๐ https://t.co/oCk5jGNYlj We tackle reward hackingโwhen RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference ๐งตโฌ๏ธ
4
32
123
Grateful to be named a recipient of the Google PhD Fellowship 2025 under the NLP track! Thanks to @Google and my wonderful @ai4bharat family for making this journey so special.
4
3
36
Excited to share one of the first projects from my PhD! We find that Adam (often seen as approximate second-order) can actually outperform Gauss-Newton (true second-order) in certain cases! Our 2x2 comparison across basis choice and gradient noise is revealing! Thread by Sham:
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
12
106
@thawani_avijit Haha. I am afraid people interpreted my โdelete tokenizerโ as โuse bytes directly without BPEโ, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.
41
41
922
1/ Really looking forward to #PytorchConf this week in SF-- I've spent the last couple of months at @datologyai immersed in the DataLoader ecosystem (especially for our VLM stack) and I have a few topics I would love to discuss with folks (DMs are open, say hi if you see me, etc.
1
15
69
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
19
126
๐ง How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full
48
281
2K
BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something weโve been doing since BERT from 2018. The first
12
97
826
Our lab @Berkeley_EECS is recruiting PhD students! We develop ML methods for the health + social sciences in order to build a fairer, healthier world. Apply to @Berkeley_EECS or @UCJointCPH and mention my name in your application! More info: https://t.co/5zbN1G6RWE
10
109
406
Those interested in joining my lab for PhD or Masters, please submit your application through this process.
Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4
0
3
12
Models are quite bad at finding bibtex for paper links, even when explicitly asked for simple arxiv bibtex. Can someone solve this or let me know a reliable tool which does this.
@m2saxon Thank you for bringing this to our attention. The paper was originally written in a Google Doc, and correct links were incorrectly converted to BibTeX citations. We spent around an hour correcting them, and now itโs fixed. https://t.co/n2L35dxUjX
1
0
3
I am incredibly excited to introduce rLLM v0.2. Zooming back to a year ago: @OpenAI's o1-preview just dropped, and RL + test-time scaling suddenly became the hype. But no one knew how they did it. @kylepmont and I had this idea - what if we built a solver-critique loop for
๐ Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple
8
33
305
LLMs solving math benchmarks with verifiable answers like AIME? โ
LLMs solving math proofs? โ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key
9
37
186
I have a message for grad school applicants looking for professors to accept them: A lot of professors are also looking for you, but you donโt know because you havenโt checked. In your preparation for application, donโt just look for professors doing your research of interest..
10
21
249
New paper ๐ข Most powerful vision-language (VL) reasoning datasets remain proprietary ๐, hindering efforts to study their principles and develop similarly effective datasets in the open ๐. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data
4
32
153
There is so much noise in the LLM RL space, so we sat down and ran everything at scale (so you dont have to ๐) and presenting to you โThe Art of Scaling RLโ Give this a read before starting your next RL run. Led by amazing @Devvrit_Khatri @lovish
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
3
19
222
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
10
103
545
The Art of Scaling Reinforcement Learning Compute for LLMs "We present the first large-scale systematic study, amounting to more than 400,000 GPU-hours, that defines a principled framework for analyzing and predicting RL scaling in LLMs." "we propose a best-practice recipe,
8
46
318
Gemma is such an underrated model in its parameter range.
An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells.ย With more preclinical and clinical tests,
5
4
67
Just to recap: We found out today that an LLM that fits on a high-end consumer GPU, when trained on specific biological data, can discover a novel method to make cancer tumors more responsive to immunotherapy. Confirmed novel discovery (not present in existing literature).
Google and Yale scientists have trained an LLM that has generated a novel hypothesis about cancer cellular behavior. This prediction was confirmed multiple times in vitro. - "What made this prediction so exciting was that it was a novel idea. Although CK2 has been implicated in
135
845
9K
An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells.ย With more preclinical and clinical tests,
556
3K
22K