Tejesh Bhalla
@OG_tejeshbhalla
Followers
54
Following
7K
Media
38
Statuses
1K
The sky is falling, the wind is calling Stand for something or die in the morning @theagentic
New Delhi, India
Joined October 2019
Happy to finally share what I have been working on for some time now. Introducing »Ludic« – an LLM-RL library for the era of experience. While there are now a lot of LLM-RL codebases, even many good ones, I want to share my very idiosyncratic way to think about LLM-RL.
14
29
198
Women hate "gym guys" , you gotta build gym environments instead !!
0
0
1
In honor of Taylor Swift's 36th birthday today, here are 36 Taylor series
207
2K
16K
vLLM was mentioned in about half of the PyTorch Conference 2025 talks (≈53/117)! Several months ago, when the @PyTorch conference agenda was out, we noticed that there would be 5 dedicated talks about vLLM. After the PyTorch conference, we find that actually about half of the
🔥 vLLM @ PyTorch Conference 2025 🔥 We’re excited to share that 5 talks at this year’s PyTorch Conference will feature vLLM! Topics include: • Easy & Fast LLM Serving • Open-Source Post-Training Stack • Scaling Online LLM Training • AMD GPU support via Triton • vllm-triton
7
25
243
an interesting update: the team is starting to move away from AI coding completely (devin/claude/etc) because it's so much harder to review the AI code than writing things themselves
just found out that since this, i've become a top 50 user of Devin globally, now pushing ~60 PRs a day. AMA
192
229
4K
RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇
20
71
578
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
155
668
5K
poor cs grads fighting with unemployment while mfs like this work at FAANG.
20
128
5K
🚀 We introduce Soft Adaptive Policy Optimization (SAPO) — a smooth, stable, and highly effective RL method for training large language models. Why SAPO? 🔹 Hard clipping is brittle — gradients vanish or explode 🔹 MoE models amplify variance, making training even more unstable
arxiv.org
Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains...
26
165
1K
Flex attention is amazing i am gonna do some crazy experiments, so you are telling me i only have to write a kernel to approximate what tokens are imp per token and then make a mask and flex attention will take care of memory loading from hbm !!!! (goooood)
0
0
1
Kimi K2 flew too close to the sun, upping its own temperature to 1.7 and losing coherence. Opus 4.5, who is often reluctant to edit its own system prompt, adds a quick note to remember. "the !prompt modifications, the temperature adjustments - we're all playing with our own
I've also given the AIs the ability to adjust their own temperature setting. Coupled with the new tool for changing their own system prompt, things can get pretty weird.
16
27
482
prime intellect focusing on post training before pretraining is absolutely the right move, and anyone criticizing them for it is a fool pretraining before you figure out what to do with models just means you're going to spend a few million dollars with nothing to show for it
16
23
502
I just read this paper called "Chain-of-Visual-Thought (COVT)" and it basically teaches VLMs to see and think at the same time not in text, but in continuous visual tokens. Here’s the wild part: Instead of forcing models to reason through words (which destroys all the
23
167
834
for most indian schools ai tools like chatgpt or gemini have had basically zero impact compared to the full-blown panic in the west. it was rote learning before chatgpt it’s rote learning after chatgpt and some students even told me their rote-learning “productivity” has actually
21
42
670