a1zhang Profile Banner
Alex L Zhang Profile
Alex L Zhang

@a1zhang

Followers
16K
Following
2K
Media
109
Statuses
587

phd student @MIT_CSAIL advised by @lateinteraction, ugrad @Princeton 🫵🏻 go participate in the @GPU_MODE kernel competitions!

USA
Joined December 2015
Don't wanna be here? Send us removal request.
@a1zhang
Alex L Zhang
2 months
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
127
355
3K
@a1zhang
Alex L Zhang
3 days
as the other first year in Omar's lab this implies that im fun to be around
@jacobli99
Jacob X. Li
6 days
three months into my PhD and I'm genuinely having the time of my life??? like I get to spend my days just diving deep into things that actually fascinate me. research is so fun right now
0
2
57
@a1zhang
Alex L Zhang
6 days
btw today at 3pm PST (in ~4 hours) we're having Vicki Wang from NVIDIA giving a @GPU_MODE talk on CuTe DSL, its features, and how to use the most of it if you're currently competing in the NVFP4 Blackwell competition this will be very helpful, but it's open to anyone!
3
21
203
@a1zhang
Alex L Zhang
6 days
sometimes I wonder if 90% of ppl in the field would be able to tell if an objectively good paper is good
3
0
20
@jasshone_
Jasmine Shone @ Neurips
7 days
Excited to present my work on exploring alternative loss functions for representation learning at the NeurReps workshop at NeurIPS this Sunday (Dec 7)! Thanks to my amazing mentors @mhamilton723 @Sa_9810 and @zli11010 for all of their guidance :) Paper:
Tweet card summary image
arxiv.org
The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode...
0
3
10
@a1zhang
Alex L Zhang
7 days
Y’all should go visit if you’re there (sadly I’m not this year), Zed is doing some really cool shit :)
@zli11010
Zed @NeurIPS2025
7 days
Excited to be presenting the EnCompass search framework at NeurIPS today! Stop by to learn how EnCompass makes LLM-calling programs more reliable. Poster #2410 4:30-7:30pm https://t.co/Q0TJknNWFG Paper: https://t.co/9UlAlThKR6 dms are open if you’d like to chat about LLM agent
0
0
9
@GPU_MODE
GPU MODE
10 days
▓▓▓░░░░░░░░░ 25% We just concluded the GEMV problem for the Blackwell NVFP4 competition. And we've started on a new GEMM problem. You can still sign up and be eligible for prizes per problem and the grand prize. glhf!
1
7
91
@a1zhang
Alex L Zhang
10 days
huge step towards solving low-bit (NVFP4 / FP4) training. FP8 training has been a thing for a while, but anything lower has traditionally been unstable not affiliated at all, but I've seen Jack in the weeds for months trying out different strategies and writing CUTLASS kernels
@jackcookjack
Jack Cook
10 days
Training LLMs with NVFP4 is hard because FP4 has so few values that I can fit them all in this post: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}. But what if I told you that reducing this range even further could actually unlock better training + quantization performance? Introducing Four
3
19
155
@m_sirovatka
Matej Sirovatka
12 days
After 3 weeks, we have concluded our first problem of the @GPU_MODE x @nvidia competition, NVFP4 GEMV. Thanks to everyone who has participated, we have collected over 40k submissions from >200 users. Congrats to the winners and good luck with the next problem, NVFP4 GEMM 🔥
7
15
181
@a1zhang
Alex L Zhang
15 days
AI@Princeton stays winning
@kevin_wang3290
Kevin Wang
15 days
Excited to share that our paper "1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities" has won the Best Paper Award at NeurIPS '25! Hope to see you all in San Diego :)
0
0
42
@irombie
Irem Ergün
22 days
This concept of Recursive Language Models (RLMs) proposes an exciting direction in context length scaling of LLMs, and we should talk about it! (1/n) https://t.co/gV44SjcFr4
Tweet card summary image
alexzhang13.github.io
We propose Recursive Language Models (RLMs), an inference strategy where language models can decompose and recursively interact with input context of unbounded length through REPL environments.
3
27
199
@a1zhang
Alex L Zhang
25 days
very exciting news! congrats to @hardmaru and the rest of the super talented folks at Sakana on the raise! (if u have the option to intern there you should, i learned sm there pre-phd)
@SakanaAILabs
Sakana AI
25 days
Announcing our Series B 🐟 https://t.co/6BpYSq5uc4
5
5
126
@mobicham
mobicham
27 days
My Triton version for the NVFP4 gemv kernel competition @GPU_MODE 🧵 https://t.co/4u3hAFIlpS
Tweet card summary image
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
6
13
151
@rasmus1610
Marius Vach
28 days
Here is the solveit dialog implementing RLMs by @a1zhang using `lisette` and `toolslm` by @answerdotai (h/t @jeremyphoward)
@a1zhang
Alex L Zhang
2 months
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
1
11
28
@GPU_MODE
GPU MODE
28 days
Saturday, November 15, at 12:00 PM PST we have a special talk lined up. Paulius Micikevicius arguably the man most responsible behind the efficiency revolution in GPUs with low bit dtypes and sparsity will be speaking. Cohosting with @cHHillee
2
14
107
@JonSaadFalcon
Jon Saad-Falcon
30 days
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):
48
140
437
@18jeffreyma
Jeff Ma
30 days
We’re launching SWE-fficiency to eval whether LMs can speed up real GitHub repos on real workloads! ⏱️ 498 optimization tasks across 9 data-science, ML, and HPC repos — each with a real workload to speed up. Existing agents struggle to match expert level optimizations!
12
23
200
@_williamhu
William Hu
1 month
AI is compute-hungry. While it has generally relied on a single hardware vendor in the past, AMD GPUs now offer competitive memory and compute throughput. Yet, the software stack is brittle. So we ask: can the same DSL principles that simplified NVIDIA kernel dev translate to
7
37
162
@a1zhang
Alex L Zhang
1 month
the NVFP4 kernels on Blackwell competition has started on @GPU_MODE!!! the first problem, NVFP4 GEMV is now out and submissions can be made. good luck to all the participants!
0
5
80
@GPU_MODE
GPU MODE
1 month
1,000 registrations so far!
@NVIDIAAIDev
NVIDIA AI Developer
1 month
Ready, Set, Go! 🏎️ Create something amazing at our Blackwell NVFP4 Kernel Hackathon with @GPU_MODE. 🎊 🏆 Compete in a 4-part performance challenge to optimize low-level kernels on NVIDIA Blackwell hardware. 🥇 3 winners per challenge will receive top-tier NVIDIA hardware.
1
9
171
@rasmus1610
Marius Vach
1 month
Interesting. I'm currently implement RLMs by @a1zhang and it's super model dependent, whether the `llm_query` tool function is used or not. GPT models never use it and rely on regex a lot. Kimi-K2 likes the tool and leverages it at lot. https://t.co/pyvDGRrXUv
@a1zhang
Alex L Zhang
2 months
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
0
4
26