Hitesh Golchha @hitesh_golchha X Profile

Hitesh Golchha

@hitesh_golchha

Followers

80

Following

568

Media

6

Statuses

300

Applied Scientist @Amazon, ML Research @UMassAmherst,

Massachusetts

Joined November 2017

Don't wanna be here? Send us removal request.

Hitesh Golchha

@hitesh_golchha

1 year

🥳 Excited to announce that our work “Language Guided Exploration for RL Agents in Text Environments” was accepted into #NAACL2024 Findings! 🚀 .Paper: 🧵(1/n).

4

5

23

Hitesh Golchha

@hitesh_golchha

6 months

RT @HannaHajishirzi: Excited to release our newest, largest, and best Tulu yet. Our RLVR recipe works at scale, outperforming Deepseek V3.….

0

5

0

Hitesh Golchha

@hitesh_golchha

6 months

RT @natolambert: The DeepSeek R1 recipe, what questions we need to answer to train an o1 replication ourselves at home, and what it means f….

0

86

0

Hitesh Golchha

@hitesh_golchha

6 months

RT @SonglinYang4: I've created slides for those curious about the recent rapid progress in linear attention: from linear attention to Light….

0

164

0

Hitesh Golchha

@hitesh_golchha

6 months

RT @cgarciae88: Google Cloud just recently released "The PyTorch developer's guide to JAX fundamentals". Contains a side-by-side implement….

0

171

0

Hitesh Golchha

@hitesh_golchha

7 months

RT @nrehiew_: How to train a 670B parameter model. Let's talk about the DeepSeek v3 report + some comparisons with what Meta did with Lla….

0

527

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @cocoweixu: We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list:. • learning fr….

0

173

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @natolambert: I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with….

0

142

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @simon_jegou: 🚀 Excited to announce KVPress — our open-source library for efficient #LLM KV cache compression!.👉 Check it out (and drop….

0

8

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @currying: Around ten years ago, I started studying inverse problems in Topological Data Analysis (TDA). For decades people in computati….

0

130

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @jxmnop: Top-rated papers from ICLR 2025. Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imp….

0

88

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @abeirami: RLHF provably can't teach models any new knowledge. If you need to teach new skills, you need to look at pre-training and SFT….

0

21

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @jaseweston: 🚨 Self-Consistency Preference Optimization (ScPO)🚨.- New self-training method without human labels - learn to make the mode….

0

107

0

Hitesh Golchha

@hitesh_golchha

8 months

RT @kayembruno: Diffusion models are so ubiquitous, but it's difficult to find an introduction that is concise, simple and comprehensive.….

0

116

0

Hitesh Golchha

@hitesh_golchha

9 months

RT @wellingmax: “We are on the brink of an irreversible climate disaster. This is a global emergency beyond any doubt. Much of the very fab….

0

31

0

Hitesh Golchha

@hitesh_golchha

1 year

RT @srush_nlp: If you know Torch, I think you can code for GPU now with OpenAI's Triton language. We made some puzzles to help you rewire….

0

197

0

Hitesh Golchha

@hitesh_golchha

1 year

The work was done during my Masters degree with super amazing co-authors - @sahil_yerawar, @_dhruveshp from IESL Lab @UMass_NLP at UMass Amherst and Soham Dan, @keerthi166 from @IBMResearch! Congratulations to all! 🎉 🧵(6/n).

0

1

Hitesh Golchha

@hitesh_golchha

1 year

The Guide is then used to prune ✂️ the action space while training the Explorer 🔭. Its invocation is stochastic 🎲(controlled using ε), and can be constant or decaying (in curriculum learning fashion). We used DRRN, but you can plug in your favorite RL/LLM Explorer. 🧵(5/n).

1

0

Hitesh Golchha

@hitesh_golchha

1 year

The Guide 🔎 is first trained to learn an embedding space where relevant actions are close to task instructions. This is done using supervised contrastive learning with task instructions as anchors ⚓️, gold actions as positives ➕and available actions as hard negatives. 🧵(4/n).

2

0

Hitesh Golchha

@hitesh_golchha

1 year

We outperform all RL based baselines including DRRN, TDT, CALM, Behavior Cloning in the challenging ScienceWorld benchmarks 🏆, which have 30 interactive environments to test scientific reasoning abilities with thousands of variations divided into train/dev/test sets. 🧵(3/n)

1

0

Hitesh Golchha

@hitesh_golchha

1 year

Text-based games are riddled with sparse rewards 🥕 and large action spaces 🕹️. We impart common sense 🧠to the agent (called Explorer 🔭) via a contrastively trained Language Prior (called Guide 🔎) which uses task instructions to significantly prune ✂️ the action space. 🧵(2/n)

1

0