Gowtham Ramesh Profile
Gowtham Ramesh

@gowtham_ramesh1

Followers
417
Following
7K
Media
0
Statuses
165

Applied RS GenAI @AMD | Ex - Student Researcher @GoogleAI NYC, @WisconsinCS, @ai4bharat, @iitmadras

San Jose, CA
Joined October 2016
Don't wanna be here? Send us removal request.
@gowtham_ramesh1
Gowtham Ramesh
1 year
We've just open-sourced our 1B language model with comprehensive details to make it easy to reproduce. Check out the model card and other details here:
Tweet card summary image
huggingface.co
@EmadBarsoumPi
Emad Barsoum
1 year
AMD first 1B LLM model is released!!! proud of the team. We released everything training script, dataset detail, weights, score card and benchmark results. #AMD #LLM #ML #AI #HW #MI300X https://t.co/Mv5TTCV6Db
0
4
30
@natolambert
Nathan Lambert
22 days
We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big
94
357
2K
@Ali_TongyiLab
Tongyi Lab
24 days
We are excited to release AgentEvolver , an open-source, self-evolving agent system from Tongyi Lab. AgentEvolver integrates three synergistic mechanismsโ€”Self-Questioning , Self-Navigating , and Self-Attributing โ€”to systematically address critical bottlenecks in Agent RL
15
120
829
@iScienceLuvr
Tanishq Abraham @ NeurIPS
1 month
Defeating the Training-Inference Mismatch via FP16 Quick summary: A big problem in RL LLM training is that typical policy gradient methods expect the model generating the rollouts and the model being trained are exactly the same... but when you have a separate inference server
11
32
218
@CaimingXiong
Caiming Xiong
2 months
๐Ÿš€๐Ÿš€๐Ÿš€ Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels We at @SFResearch build an automated pipeline that converts raw web text into verifiable QA pairs, filtered and verified by LLMs, then use Group Relative Policy Optimization (GRPO) to train
1
13
84
@tydsh
Yuandong Tian
2 months
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
472
278
6K
@fengyao1909
Feng Yao
2 months
๐Ÿ†• ๐”๐ฉ๐๐š๐ญ๐ž!! A few more on ๐‘๐จ๐ฅ๐ฅ๐จ๐ฎ๐ญโ€“๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Œ๐ข๐ฌ๐ฆ๐š๐ญ๐œ๐ก. ๐Ÿงต๐Ÿ‘‡ (1/3) In Part I, we discuss ๐ญ๐จ๐ค๐ž๐งโ€“๐ฅ๐ž๐ฏ๐ž๐ฅ and ๐ฌ๐ž๐ช๐ฎ๐ž๐ง๐œ๐žโ€“๐ฅ๐ž๐ฏ๐ž๐ฅ policy gradients. While ๐ข๐๐ž๐ง๐ญ๐ข๐œ๐š๐ฅ for the classic REINFORCE algorithm, they ๐๐ข๐ฏ๐ž๐ซ๐ ๐ž for trust
@fengyao1909
Feng Yao
4 months
Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL? โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights! ๐Ÿ“‰ย Blog:
5
35
224
@jiawzhao
Jiawei Zhao
2 months
Weโ€™ve always assumed stale and off-policy data hurts RL a lot โ€” but our latest work shows the opposite. ๐Ÿง  M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and
Tweet card summary image
m2po.notion.site
Haizhong Zheng, Jiawei Zhao, Beidi Chen
@InfiniAILab
Infini-AI-Lab
2 months
๐Ÿค”Can we train RL on LLMs with extremely stale data? ๐Ÿš€Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
3
23
132
@UpupWang
Shangshang Wang
2 months
We now know that LoRA can match full-parameter RL training (from https://t.co/pGxoMLFIGv and our Tina paper https://t.co/dkXdxV3eNj), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. https://t.co/AsWWG1kmKt
@thinkymachines
Thinking Machines
2 months
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
13
70
565
@LisaSu
Lisa Su
2 months
Exciting day today! Thrilled to partner with @OpenAI to deploy 6GWs of AMD Instinct GPUs. The world needs more AI compute. Together, weโ€™re bringing the best of both companies to accelerate the global AI infrastructure buildout. Thanks @sama @gdb for the trust and partnership.
166
371
4K
@AMD
AMD
2 months
Today, weโ€™re announcing a multi-year, multi generation strategic partnership with @OpenAI that puts AMD compute at the center of the global AI infrastructure buildout. โœ… 6GW of AI infrastructure โœ… Initial 1GW deployment of AMD Instinct MI450 series GPU capacity beginning 2H
287
1K
7K
@SemiAnalysis_
SemiAnalysis
3 months
As we mentioned back in April, AMD is in war mode and has developed a new sense of urgency. With this urgency, their software engineers are working harder and smarter than ever to fix bugs and improve the ROCm user experience, all with the goal of matching the CUDA experience.
34
39
452
@fengyao1909
Feng Yao
3 months
๐Ÿ†• ๐”๐ฉ๐๐š๐ญ๐ž!! A few additional findings for ๐‘๐จ๐ฅ๐ฅ๐จ๐ฎ๐ญโ€“๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Œ๐ข๐ฌ๐ฆ๐š๐ญ๐œ๐ก: โ‘  ๐๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ฌ๐ฆ ๐๐ข๐Ÿ๐Ÿ๐ž๐ซ๐ž๐ง๐œ๐ž is a huge driver of the gap, with Sequence Parallelism (SP) causing most high max mismatch. โ‘ก ๐‹๐จ๐ง๐ ๐ž๐ซ ๐ฌ๐ž๐ช๐ฎ๐ž๐ง๐œ๐ž๐ฌ
@fengyao1909
Feng Yao
4 months
Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL? โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights! ๐Ÿ“‰ย Blog:
2
28
173
@RichardYRLi
Yingru Li
3 months
(1/x) Ever had your #LLM-#RL training mysteriously collapse? ๐Ÿ“‰ You're not alone. We saw #agentic RL runs fail with exploding #gradients, and found the culprit: a fundamental "training-inference mismatch." Our new #blog post demystifies this vicious cycle.
11
52
313
@fly51fly
fly51fly
3 months
[LG] APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation Y Zhou, J Li, Y Su, G Ramesh... [Advanced Micro Devices, Inc. (AMD) & CMU] (2025) https://t.co/gucf2SL6Pb
0
3
8
@KunhaoZ
Kunhao Zheng
3 months
We trained the whole stack. ๐—ฃ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป. ๐—ฆ๐—™๐—ง. ๐—ฅ๐—Ÿ. ๐—ข๐—ฝ๐—ฒ๐—ป ๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜๐˜€. ๐—ข๐—ฝ๐—ฒ๐—ป ๐—บ๐—ฒ๐˜๐—ต๐—ผ๐—ฑ๐˜€. ๐—ข๐—ฝ๐—ฒ๐—ป ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ. From tokens to traces. From guesses to grounded. ๐—–๐—ผ๐—ฑ๐—ฒ ๐—ช๐—ผ๐—ฟ๐—น๐—ฑ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น is here 50+ pages Report. Ckpts. Code. https://t.co/e5dLMMJy3I
@syhw
Gabriel Synnaeve
3 months
(๐Ÿงต) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg
21
70
650
@iScienceLuvr
Tanishq Abraham @ NeurIPS
3 months
APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation "we propose Active Partial Rollouts in Reinforcement Learning (APRIL), which mitigates long-tail inefficiency." "Experiments show that APRIL improves rollout throughput by at most 44% across
4
19
140
@AnushElangovan
Anush Elangovan
3 months
ROCm 7.0 is here
10
24
243
@elonmusk
Elon Musk
3 months
@zephyr_z9 ๐Ÿ˜‚ Although AMD is now working pretty well for small to medium sized models
120
188
2K
@slime_framework
slime
3 months
Weโ€™re live! ๐ŸŽ‰ This is the official account for slime โ€” an open-source, SGLang-native post-training framework for RL scaling. Kicking things off with our first milestone โ†’ v0.1.0 release ๐Ÿงช Blog: https://t.co/ORH3J6UTYL Follow us to run RL faster โšก๏ธ
16
8
30
@gordic_aleksa
Aleksa Gordiฤ‡ (ๆฐดๅนณ้—ฎ้ข˜)
3 months
New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up
63
405
3K