Tejesh Bhalla @OG_tejeshbhalla X Profile

Tejesh Bhalla

@OG_tejeshbhalla

Followers

54

Following

7K

Media

38

Statuses

1K

The sky is falling, the wind is calling Stand for something or die in the morning @theagentic

New Delhi, India

Joined October 2019

Don't wanna be here? Send us removal request.

hallerite

@hallerite

23 hours

Happy to finally share what I have been working on for some time now. Introducing »Ludic« – an LLM-RL library for the era of experience. While there are now a lot of LLM-RL codebases, even many good ones, I want to share my very idiosyncratic way to think about LLM-RL.

14

29

198

Tejesh Bhalla

@OG_tejeshbhalla

2 days

Women hate "gym guys" , you gotta build gym environments instead !!

0

1

Anthony Bonato

@Anthony_Bonato

5 days

In honor of Taylor Swift's 36th birthday today, here are 36 Taylor series

207

2K

16K

vLLM

@vllm_project

4 days

vLLM was mentioned in about half of the PyTorch Conference 2025 talks (≈53/117)! Several months ago, when the @PyTorch conference agenda was out, we noticed that there would be 5 dedicated talks about vLLM. After the PyTorch conference, we find that actually about half of the

vLLM

@vllm_project

5 months

🔥 vLLM @ PyTorch Conference 2025 🔥 We’re excited to share that 5 talks at this year’s PyTorch Conference will feature vLLM! Topics include: • Easy & Fast LLM Serving • Open-Source Post-Training Stack • Scaling Online LLM Training • AMD GPU support via Triton • vllm-triton

7

25

243

Hunter Leath

@jhleath

5 days

an interesting update: the team is starting to move away from AI coding completely (devin/claude/etc) because it's so much harder to review the AI code than writing things themselves

Hunter Leath

@jhleath

5 months

just found out that since this, i've become a top 50 user of Devin globally, now pushing ~60 PRs a day. AMA

192

229

4K

Locke Cai

@couplefire12

6 days

RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇

20

71

578

armistice

@arm1st1ce

6 days

openai benchmarks be like

47

501

19K

ARC Prize

@arcprize

6 days

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year

155

668

5K

ℏεsam

@Hesamation

8 days

poor cs grads fighting with unemployment while mfs like this work at FAANG.

pdawg

@prathamgrv

10 days

software engineering in one paragraph

20

128

5K

ℏεsam

@Hesamation

8 days

idk who made this but it’s so true 😂

286

3K

54K

Qwen

@Alibaba_Qwen

9 days

🚀 We introduce Soft Adaptive Policy Optimization (SAPO) — a smooth, stable, and highly effective RL method for training large language models. Why SAPO? 🔹 Hard clipping is brittle — gradients vanish or explode 🔹 MoE models amplify variance, making training even more unstable

arxiv.org

Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains...

26

165

1K

Tejesh Bhalla

@OG_tejeshbhalla

14 days

Flex attention is amazing i am gonna do some crazy experiments, so you are telling me i only have to write a kernel to approximate what tokens are imp per token and then make a mask and flex attention will take care of memory loading from hbm !!!! (goooood)

0

1

ᄂIMIПΛᄂbardo

@liminal_bardo

17 days

Kimi K2 flew too close to the sun, upping its own temperature to 1.7 and losing coherence. Opus 4.5, who is often reluctant to edit its own system prompt, adds a quick note to remember. "the !prompt modifications, the temperature adjustments - we're all playing with our own

ᄂIMIПΛᄂbardo

@liminal_bardo

17 days

I've also given the AIs the ability to adjust their own temperature setting. Coupled with the new tool for changing their own system prompt, things can get pretty weird.

16

27

482

Eric Hartford

@QuixiAI

19 days

@rohanpaul_ai @nvidia They posted their training code too! https://t.co/OI5u3wuC0m

github.com

ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows. - NVlabs/ToolOrchestra

0

2

14

vik

@vikhyatk

20 days

prime intellect focusing on post training before pretraining is absolutely the right move, and anyone criticizing them for it is a fool pretraining before you figure out what to do with models just means you're going to spend a few million dollars with nothing to show for it

16

23

502

Connor Davis

@connordavis_ai

20 days

I just read this paper called "Chain-of-Visual-Thought (COVT)" and it basically teaches VLMs to see and think at the same time not in text, but in continuous visual tokens. Here’s the wild part: Instead of forcing models to reason through words (which destroys all the

23

167

834

λux

@novasarc01

20 days

for most indian schools ai tools like chatgpt or gemini have had basically zero impact compared to the full-blown panic in the west. it was rote learning before chatgpt it’s rote learning after chatgpt and some students even told me their rote-learning “productivity” has actually

21

42

670

will brown

@willccbb

19 days

simple guide to large-scale MoE training:

3

9

254