vwxyzjn Profile Banner
Costa Huang Profile
Costa Huang

@vwxyzjn

Followers
7K
Following
9K
Media
403
Statuses
2K

Prev: RL @allen_ai @huggingface. Built @cleanrl_lib.

Philadelphia, PA
Joined March 2013
Don't wanna be here? Send us removal request.
@vwxyzjn
Costa Huang
9 months
🚀 Happy to share Tülu 3! We trained the model with actual RL: the model only receives rewards if its generations are verified to be correct (e.g., correct math solution). ❤️ Check out our beautiful RL curves. Code is also available: ~single file PPO that scales to 70B.
Tweet media one
13
81
488
@vwxyzjn
Costa Huang
4 days
🤩 @astral `uv` installing deep-ep and deep-gemm together! Thanks @charliermarsh!. code in the thread
Tweet media one
1
9
102
@vwxyzjn
Costa Huang
5 days
Nice! If you don't own the pypi-level server infra, then there is not much you can do about its limitations.
@charliermarsh
Charlie Marsh
7 days
With pyx, we can solve these problems. And for me, that's the most exciting thing about it. By providing our own end-to-end infrastructure we can solve _so_ many more problems for users that used to be out-of-scope.
0
0
5
@vwxyzjn
Costa Huang
6 days
Congrats to friends at Ai2! This is amazing 🤩.
@allen_ai
Ai2
6 days
With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡
Tweet media one
1
0
39
@vwxyzjn
Costa Huang
6 days
Check out Jason’s new work on RL tricks. Lots of ablation studies comparing popular techniques like Clip Higher 🤩.
@JasonLiu106968
Jason Liu
7 days
Excited to share our #RL_for_LLM paper: "Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning"  . We conducted a comprehensive analysis of RL techniques in LLM domain!🥳 .Surprisingly, we found that using only 2 techniques can unlock the learning capability of LLMs.😮
Tweet media one
1
5
79
@vwxyzjn
Costa Huang
11 days
To demonstrate @vllm_project is a hackable for-loop. You *can* add requests in the middle of generations while still do batching properly.
Tweet media one
Tweet media two
4
15
243
@vwxyzjn
Costa Huang
11 days
RT @ChangJonathanC: while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch. https://….
Tweet card summary image
jonathanc.net
PyTorch FlexAttention tutorial: Building a minimal vLLM-style inference engine from scratch with paged attention
0
37
0
@vwxyzjn
Costa Huang
14 days
RT @_rockt: Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world mo….
0
188
0
@vwxyzjn
Costa Huang
1 month
RT @iScienceLuvr: Announcement 📢. We're hiring at @SophontAI for a variety of positions!. We're looking for exceptional, high-agency ML res….
0
23
0
@vwxyzjn
Costa Huang
2 months
RT @valentina__py: 💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of….
0
94
0
@vwxyzjn
Costa Huang
2 months
RT @allen_ai: New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:….
0
43
0
@vwxyzjn
Costa Huang
2 months
🤯 impressive.
@OriolVinyalsML
Oriol Vinyals
2 months
Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't always about large models and beating benchmarks. In this case, a super fast & good model can unlock drastic use cases. Read more:
0
1
13
@vwxyzjn
Costa Huang
2 months
I am pretty amazed at Pure Python code matching vllm's performance 🤯.
Tweet card summary image
github.com
Nano vLLM. Contribute to GeeeekExplorer/nano-vllm development by creating an account on GitHub.
0
3
35
@vwxyzjn
Costa Huang
2 months
RT @ericjang11: Here’s our latest RL update: Natural Mogging.(thread below!).
0
37
0
@vwxyzjn
Costa Huang
2 months
Weixun's team at Alibaba is presenting a new RL framework for LLM training. It's super nice that they included training curves on Qwen3-30B-A3B-base! The roll looks yummy 😋
Tweet media one
@weixunwang
wang
2 months
🚀 Introducing ROLL: An Efficient and User-Friendly RL Training Framework for Large-Scale Learning!. 🔥 Efficient, Scalable & Flexible – Train 200B+ models with 5D parallelism (TP/PP/CP/EP/DP), seamless vLLM/SGLang switching, async multi-env rollout for maximum RL throughput!
Tweet media one
5
15
118
@vwxyzjn
Costa Huang
3 months
RT @interconnectsai: How I Write.And therein how I think. And how AI impacts it.
Tweet card summary image
interconnects.ai
Therein how I think, how AI impacts it, and how writing reflects upon AI progress.
0
3
0
@vwxyzjn
Costa Huang
3 months
RT @cursor_ai: A conversation on the optimal reward for coding agents, infinite context models, and real-time RL
0
142
0
@vwxyzjn
Costa Huang
3 months
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
77
0
@vwxyzjn
Costa Huang
3 months
RT @PeterHndrsn: The next ~1-4 years will be taking the 2017-2020 years of Deep RL and scaling up: exploration, generalization, long-horizo….
0
37
0