Costa Huang @vwxyzjn X Profile

Costa Huang

@vwxyzjn

Followers

7K

Following

9K

Media

403

Statuses

2K

Prev: RL @allen_ai @huggingface. Built @cleanrl_lib.

Philadelphia, PA

Joined March 2013

Don't wanna be here? Send us removal request.

Costa Huang

@vwxyzjn

9 months

🚀 Happy to share Tülu 3! We trained the model with actual RL: the model only receives rewards if its generations are verified to be correct (e.g., correct math solution). ❤️ Check out our beautiful RL curves. Code is also available: ~single file PPO that scales to 70B.

13

81

488

Costa Huang

@vwxyzjn

4 days

Here is a minimal repo: Issue / PR here:.

github.com

Summary There's a lot here (maybe it should go somewhere else?), but this is a complex topic and I wanted to include a lot of copy-pasteable examples for common cases. Closes #10694. Closes...

1

0

10

Costa Huang

@vwxyzjn

4 days

🤩 @astral `uv` installing deep-ep and deep-gemm together! Thanks @charliermarsh!. code in the thread

1

9

102

Costa Huang

@vwxyzjn

5 days

Nice! If you don't own the pypi-level server infra, then there is not much you can do about its limitations.

Charlie Marsh

@charliermarsh

7 days

With pyx, we can solve these problems. And for me, that's the most exciting thing about it. By providing our own end-to-end infrastructure we can solve _so_ many more problems for users that used to be out-of-scope.

0

5

Costa Huang

@vwxyzjn

6 days

Congrats to friends at Ai2! This is amazing 🤩.

Ai2

@allen_ai

6 days

With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

1

0

39

Costa Huang

@vwxyzjn

6 days

Check out Jason’s new work on RL tricks. Lots of ablation studies comparing popular techniques like Clip Higher 🤩.

Jason Liu

@JasonLiu106968

7 days

Excited to share our #RL_for_LLM paper: "Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning" . We conducted a comprehensive analysis of RL techniques in LLM domain!🥳 .Surprisingly, we found that using only 2 techniques can unlock the learning capability of LLMs.😮

1

5

79

Costa Huang

@vwxyzjn

11 days

@vllm_project

gist.github.com

GitHub Gist: instantly share code, notes, and snippets.

0

1

10

Costa Huang

@vwxyzjn

11 days

To demonstrate @vllm_project is a hackable for-loop. You *can* add requests in the middle of generations while still do batching properly.

4

15

243

Costa Huang

@vwxyzjn

11 days

RT @ChangJonathanC: while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch. https://….

jonathanc.net

PyTorch FlexAttention tutorial: Building a minimal vLLM-style inference engine from scratch with paged attention

0

37

0

Costa Huang

@vwxyzjn

14 days

RT @_rockt: Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world mo….

0

188

0

Costa Huang

@vwxyzjn

1 month

RT @iScienceLuvr: Announcement 📢. We're hiring at @SophontAI for a variety of positions!. We're looking for exceptional, high-agency ML res….

0

23

0

Costa Huang

@vwxyzjn

2 months

RT @valentina__py: 💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of….

0

94

0

Costa Huang

@vwxyzjn

2 months

RT @allen_ai: New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:….

0

43

0

Costa Huang

@vwxyzjn

2 months

🤯 impressive.

Oriol Vinyals

@OriolVinyalsML

2 months

Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't always about large models and beating benchmarks. In this case, a super fast & good model can unlock drastic use cases. Read more:

0

1

13

Costa Huang

@vwxyzjn

2 months

I am pretty amazed at Pure Python code matching vllm's performance 🤯.

github.com

Nano vLLM. Contribute to GeeeekExplorer/nano-vllm development by creating an account on GitHub.

0

3

35

Costa Huang

@vwxyzjn

2 months

RT @ericjang11: Here’s our latest RL update: Natural Mogging.(thread below!).

0

37

0

Costa Huang

@vwxyzjn

2 months

Weixun's team at Alibaba is presenting a new RL framework for LLM training. It's super nice that they included training curves on Qwen3-30B-A3B-base! The roll looks yummy 😋

wang

@weixunwang

2 months

🚀 Introducing ROLL: An Efficient and User-Friendly RL Training Framework for Large-Scale Learning!. 🔥 Efficient, Scalable & Flexible – Train 200B+ models with 5D parallelism (TP/PP/CP/EP/DP), seamless vLLM/SGLang switching, async multi-env rollout for maximum RL throughput!

5

15

118

Costa Huang

@vwxyzjn

3 months

RT @interconnectsai: How I Write.And therein how I think. And how AI impacts it.

interconnects.ai

Therein how I think, how AI impacts it, and how writing reflects upon AI progress.

0

3

0

Costa Huang

@vwxyzjn

3 months

RT @cursor_ai: A conversation on the optimal reward for coding agents, infinite context models, and real-time RL

0

142

0

Costa Huang

@vwxyzjn

3 months

RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….

0

77

0

Costa Huang

@vwxyzjn

3 months

RT @PeterHndrsn: The next ~1-4 years will be taking the 2017-2020 years of Deep RL and scaling up: exploration, generalization, long-horizo….

0

37

0