Andrew Zhao
@_AndrewZhao
Followers
5K
Following
4K
Media
115
Statuses
1K
PhD @Tsinghua_Uni Absolute Zero,ExpeL,Diver-CT Ex. intern @MSFTResearch,@ BIGAI Interested in RL, Reasoning/Safety 4 LLMs, Agents On industry job market 2026
Joined September 2020
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
60
342
2K
🚀 Job market post I’m a final-year PhD student at @kaistpr CS, looking for #postdoc/industry research position starting Fall 2026! I develop multilingual, multicultural language models and data, inspired by how humans use and learn language. 🔗 https://t.co/LkmT7inSbH
11
52
357
🚀 We propose Generative Adversarial Distillation (GAD) 🤖 Designed to perform on-policy distillation from proprietary black-box LLMs. ➡️ Requires neither access to teacher logits nor alignment of tokenizer vocabularies. (1/n)
5
10
20
Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well.
openai.com
We trained models to think in simpler, more traceable steps—so we can better understand how they work.
15
47
407
As an author, reviewer, and AC, I propose adding a resource tag (e.g., industry/academic, low/high compute) to ICLR/ICML/NeurIPS submissions. In the pursuit of AGI, research resources vary drastically. Some works are minimal academic prototypes; others come from well-equipped
6
22
252
My student @Yang_Liuu is on the job market looking for an industry lab position and she's fantastic! She's done rigorous experimental work on in-context learning and identified the value of looping early on for meta learning algorithms and demystified aspects of task vectors
"Looped Transformers are Better at Learning Learning Algorithms" in ICLR @Yang_Liuu offers a simple and clean message in this paper. When it comes to emulating learning algorithms, using a looped transformer (i.e., one where the iterative structure is hardcoded) helps a lot.
3
10
67
My student @nayoung_nylee is on the job market and she's exceptional. Her work uses arithmetic as a lens to understand LLMs at a fundamental level, predicting the value of CoT, data format, self-improvement and RL (pre-GRPO!). These gave us a ton of insights before they emerged
1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.
1
16
139
I will be co-hosting this student researcher, so if you're interested in working with myself and Minsuk on MARL for LLMs, please apply!
I'm hiring a student researcher for next summer at the intersection of MARL x LLM. If you have a strong background and research experience in MARL algorithms, please apply and drop me an email (so that I know you've applied!) https://t.co/sikY6cgzju
6
11
173
RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵
12
112
462
Checkout new results from our limits-of-RLVR paper, hope we can break out that barrier together as a community 🫡
In camera-ready, we include preliminary scaling experiments using Magistral, a near-frontier pure RLVR model, and the conclusion remains consistent. I’m also curious: if we scale RLVR compute to 10–1000× of Magistral, would it actually produce new knowledge beyond pretraining?
3
5
62
checkout our new work AdaptiveNN — an active visual reasoning framework. It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations. up to 28× lower infer cost and more human-like vision. https://t.co/h9acMWPlpE
2
23
82
AdaptiveNN demonstrates a promising avenue towards efficient, flexible and interpretable computer vision. It also emerges as a valuable tool for investigating visual cognition. https://t.co/p9uUng9Aev
1
7
31
Thrilled that our paper received the only perfect score at NeurIPS this year. Huge thanks to my collaborators and the reviewers. See you in San Diego! https://t.co/HHTjelGU1Z
https://t.co/kitS2uUX6B credit to @papercopilot
20
56
724
Someone actually did it https://t.co/qcqcsDGe1r
1
2
51
Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries
197
652
4K
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
29
92
368
Introducing our work Test-Time Steering for Lossless Text Compression via Weighted Product of Experts — a simple way to combine LLMs with traditional compressors so the ensemble is never worse than the best expert, and often better. I will be at Hall C this afternoon! (0/N)
2
7
11
New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the
19
65
363