Gowtham Ramesh @gowtham_ramesh1 X Profile

Gowtham Ramesh

@gowtham_ramesh1

Followers

417

Following

7K

Media

0

Statuses

165

Applied RS GenAI @AMD | Ex - Student Researcher @GoogleAI NYC, @WisconsinCS, @ai4bharat, @iitmadras

https://t.co/M3w2PKoJ8d

San Jose, CA

Joined October 2016

Don't wanna be here? Send us removal request.

Gowtham Ramesh

@gowtham_ramesh1

1 year

We've just open-sourced our 1B language model with comprehensive details to make it easy to reproduce. Check out the model card and other details here:

huggingface.co

Emad Barsoum

@EmadBarsoumPi

1 year

AMD first 1B LLM model is released!!! proud of the team. We released everything training script, dataset detail, weights, score card and benchmark results. #AMD #LLM #ML #AI #HW #MI300X https://t.co/Mv5TTCV6Db

0

4

30

Nathan Lambert

@natolambert

22 days

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big

94

357

2K

Tongyi Lab

@Ali_TongyiLab

24 days

We are excited to release AgentEvolver , an open-source, self-evolving agent system from Tongyi Lab. AgentEvolver integrates three synergistic mechanisms—Self-Questioning , Self-Navigating , and Self-Attributing —to systematically address critical bottlenecks in Agent RL

15

120

829

Tanishq Abraham @ NeurIPS

@iScienceLuvr

1 month

Defeating the Training-Inference Mismatch via FP16 Quick summary: A big problem in RL LLM training is that typical policy gradient methods expect the model generating the rollouts and the model being trained are exactly the same... but when you have a separate inference server

11

32

218

Caiming Xiong

@CaimingXiong

2 months

🚀🚀🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels We at @SFResearch build an automated pipeline that converts raw web text into verifiable QA pairs, filtered and verified by LLMs, then use Group Relative Policy Optimization (GRPO) to train

1

13

84

Yuandong Tian

@tydsh

2 months

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

472

278

6K

Feng Yao

@fengyao1909

2 months

🆕 𝐔𝐩𝐝𝐚𝐭𝐞!! A few more on 𝐑𝐨𝐥𝐥𝐨𝐮𝐭–𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐢𝐬𝐦𝐚𝐭𝐜𝐡. 🧵👇 (1/3) In Part I, we discuss 𝐭𝐨𝐤𝐞𝐧–𝐥𝐞𝐯𝐞𝐥 and 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞–𝐥𝐞𝐯𝐞𝐥 policy gradients. While 𝐢𝐝𝐞𝐧𝐭𝐢𝐜𝐚𝐥 for the classic REINFORCE algorithm, they 𝐝𝐢𝐯𝐞𝐫𝐠𝐞 for trust

Feng Yao

@fengyao1909

4 months

Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL? ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜𝐫𝐞𝐭𝐥𝐲 𝐭𝐮𝐫𝐧𝐬 𝐲𝐨𝐮𝐫 𝐑𝐋 𝐢𝐧𝐭𝐨 𝐨𝐟𝐟-𝐩𝐨𝐥𝐢𝐜𝐲 — even if they share the same weights! 📉 Blog:

5

35

224

Jiawei Zhao

@jiawzhao

2 months

We’ve always assumed stale and off-policy data hurts RL a lot — but our latest work shows the opposite. 🧠 M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and

m2po.notion.site

Haizhong Zheng, Jiawei Zhao, Beidi Chen

Infini-AI-Lab

@InfiniAILab

2 months

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and

3

23

132

Shangshang Wang

@UpupWang

2 months

We now know that LoRA can match full-parameter RL training (from https://t.co/pGxoMLFIGv and our Tina paper https://t.co/dkXdxV3eNj), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. https://t.co/AsWWG1kmKt

Thinking Machines

@thinkymachines

2 months

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.

13

70

565

Lisa Su

@LisaSu

2 months

Exciting day today! Thrilled to partner with @OpenAI to deploy 6GWs of AMD Instinct GPUs. The world needs more AI compute. Together, we’re bringing the best of both companies to accelerate the global AI infrastructure buildout. Thanks @sama @gdb for the trust and partnership.

166

371

4K

AMD

@AMD

2 months

Today, we’re announcing a multi-year, multi generation strategic partnership with @OpenAI that puts AMD compute at the center of the global AI infrastructure buildout. ✅ 6GW of AI infrastructure ✅ Initial 1GW deployment of AMD Instinct MI450 series GPU capacity beginning 2H

287

1K

7K

SemiAnalysis

@SemiAnalysis_

3 months

As we mentioned back in April, AMD is in war mode and has developed a new sense of urgency. With this urgency, their software engineers are working harder and smarter than ever to fix bugs and improve the ROCm user experience, all with the goal of matching the CUDA experience.

34

39

452

Feng Yao

@fengyao1909

3 months

🆕 𝐔𝐩𝐝𝐚𝐭𝐞!! A few additional findings for 𝐑𝐨𝐥𝐥𝐨𝐮𝐭–𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐢𝐬𝐦𝐚𝐭𝐜𝐡: ① 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐬𝐦 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 is a huge driver of the gap, with Sequence Parallelism (SP) causing most high max mismatch. ② 𝐋𝐨𝐧𝐠𝐞𝐫 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐬

Feng Yao

@fengyao1909

4 months

Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL? ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜𝐫𝐞𝐭𝐥𝐲 𝐭𝐮𝐫𝐧𝐬 𝐲𝐨𝐮𝐫 𝐑𝐋 𝐢𝐧𝐭𝐨 𝐨𝐟𝐟-𝐩𝐨𝐥𝐢𝐜𝐲 — even if they share the same weights! 📉 Blog:

2

28

173

Yingru Li

@RichardYRLi

3 months

(1/x) Ever had your #LLM-#RL training mysteriously collapse? 📉 You're not alone. We saw #agentic RL runs fail with exploding #gradients, and found the culprit: a fundamental "training-inference mismatch." Our new #blog post demystifies this vicious cycle.

11

52

313

fly51fly

@fly51fly

3 months

[LG] APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation Y Zhou, J Li, Y Su, G Ramesh... [Advanced Micro Devices, Inc. (AMD) & CMU] (2025) https://t.co/gucf2SL6Pb

0

3

8

Kunhao Zheng

@KunhaoZ

3 months

We trained the whole stack. 𝗣𝗿𝗲𝘁𝗿𝗮𝗶𝗻. 𝗦𝗙𝗧. 𝗥𝗟. 𝗢𝗽𝗲𝗻 𝘄𝗲𝗶𝗴𝗵𝘁𝘀. 𝗢𝗽𝗲𝗻 𝗺𝗲𝘁𝗵𝗼𝗱𝘀. 𝗢𝗽𝗲𝗻 𝘀𝗰𝗶𝗲𝗻𝗰𝗲. From tokens to traces. From guesses to grounded. 𝗖𝗼𝗱𝗲 𝗪𝗼𝗿𝗹𝗱 𝗠𝗼𝗱𝗲𝗹 is here 50+ pages Report. Ckpts. Code. https://t.co/e5dLMMJy3I

Gabriel Synnaeve

@syhw

3 months

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg

21

70

650

Tanishq Abraham @ NeurIPS

@iScienceLuvr

3 months

APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation "we propose Active Partial Rollouts in Reinforcement Learning (APRIL), which mitigates long-tail inefficiency." "Experiments show that APRIL improves rollout throughput by at most 44% across

4

19

140

Anush Elangovan

@AnushElangovan

3 months

ROCm 7.0 is here

10

24

243

Elon Musk

@elonmusk

3 months

@zephyr_z9 😂 Although AMD is now working pretty well for small to medium sized models

120

188

2K

slime

@slime_framework

3 months

We’re live! 🎉 This is the official account for slime — an open-source, SGLang-native post-training framework for RL scaling. Kicking things off with our first milestone → v0.1.0 release 🧪 Blog: https://t.co/ORH3J6UTYL Follow us to run RL faster ⚡️

16

8

30

Aleksa Gordić (水平问题)

@gordic_aleksa

3 months

New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up

63

405

3K