Hitesh Golchha Profile
Hitesh Golchha

@hitesh_golchha

Followers
80
Following
568
Media
6
Statuses
300

Applied Scientist @Amazon, ML Research @UMassAmherst,

Massachusetts
Joined November 2017
Don't wanna be here? Send us removal request.
@hitesh_golchha
Hitesh Golchha
1 year
🥳 Excited to announce that our work “Language Guided Exploration for RL Agents in Text Environments” was accepted into #NAACL2024 Findings! 🚀 .Paper: 🧵(1/n).
4
5
23
@hitesh_golchha
Hitesh Golchha
6 months
RT @HannaHajishirzi: Excited to release our newest, largest, and best Tulu yet. Our RLVR recipe works at scale, outperforming Deepseek V3.….
0
5
0
@hitesh_golchha
Hitesh Golchha
6 months
RT @natolambert: The DeepSeek R1 recipe, what questions we need to answer to train an o1 replication ourselves at home, and what it means f….
0
86
0
@hitesh_golchha
Hitesh Golchha
6 months
RT @SonglinYang4: I've created slides for those curious about the recent rapid progress in linear attention: from linear attention to Light….
0
164
0
@hitesh_golchha
Hitesh Golchha
6 months
RT @cgarciae88: Google Cloud just recently released "The PyTorch developer's guide to JAX fundamentals". Contains a side-by-side implement….
0
171
0
@hitesh_golchha
Hitesh Golchha
7 months
RT @nrehiew_: How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Lla….
0
527
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @cocoweixu: We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list:. • learning fr….
0
173
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @natolambert: I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with….
0
142
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @simon_jegou: 🚀 Excited to announce KVPress — our open-source library for efficient #LLM KV cache compression!.👉 Check it out (and drop….
0
8
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @currying: Around ten years ago, I started studying inverse problems in Topological Data Analysis (TDA). For decades people in computati….
0
130
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @jxmnop: Top-rated papers from ICLR 2025. Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imp….
0
88
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @abeirami: RLHF provably can't teach models any new knowledge. If you need to teach new skills, you need to look at pre-training and SFT….
0
21
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @jaseweston: 🚨 Self-Consistency Preference Optimization (ScPO)🚨.- New self-training method without human labels - learn to make the mode….
0
107
0
@hitesh_golchha
Hitesh Golchha
8 months
RT @kayembruno: Diffusion models are so ubiquitous, but it's difficult to find an introduction that is concise, simple and comprehensive.….
0
116
0
@hitesh_golchha
Hitesh Golchha
9 months
RT @wellingmax: “We are on the brink of an irreversible climate disaster. This is a global emergency beyond any doubt. Much of the very fab….
0
31
0
@hitesh_golchha
Hitesh Golchha
1 year
RT @srush_nlp: If you know Torch, I think you can code for GPU now with OpenAI's Triton language. We made some puzzles to help you rewire….
0
197
0
@hitesh_golchha
Hitesh Golchha
1 year
The work was done during my Masters degree with super amazing co-authors - @sahil_yerawar, @_dhruveshp from IESL Lab @UMass_NLP at UMass Amherst and Soham Dan, @keerthi166 from @IBMResearch! Congratulations to all! 🎉 🧵(6/n).
0
0
1
@hitesh_golchha
Hitesh Golchha
1 year
The Guide is then used to prune ✂️ the action space while training the Explorer 🔭. Its invocation is stochastic 🎲(controlled using ε), and can be constant or decaying (in curriculum learning fashion). We used DRRN, but you can plug in your favorite RL/LLM Explorer. 🧵(5/n).
1
0
0
@hitesh_golchha
Hitesh Golchha
1 year
The Guide 🔎 is first trained to learn an embedding space where relevant actions are close to task instructions. This is done using supervised contrastive learning with task instructions as anchors ⚓️, gold actions as positives ➕and available actions as hard negatives. 🧵(4/n).
2
0
0
@hitesh_golchha
Hitesh Golchha
1 year
We outperform all RL based baselines including DRRN, TDT, CALM, Behavior Cloning in the challenging ScienceWorld benchmarks 🏆, which have 30 interactive environments to test scientific reasoning abilities with thousands of variations divided into train/dev/test sets. 🧵(3/n)
Tweet media one
1
0
0
@hitesh_golchha
Hitesh Golchha
1 year
Text-based games are riddled with sparse rewards 🥕 and large action spaces 🕹️. We impart common sense 🧠to the agent (called Explorer 🔭) via a contrastively trained Language Prior (called Guide 🔎) which uses task instructions to significantly prune ✂️ the action space. 🧵(2/n)
Tweet media one
1
0
0