Andrew Zhao @_AndrewZhao X Profile

Andrew Zhao

@_AndrewZhao

Followers

5K

Following

4K

Media

115

Statuses

1K

PhD @Tsinghua_Uni Absolute Zero,ExpeL,Diver-CT Ex. intern @MSFTResearch,@ BIGAI Interested in RL, Reasoning/Safety 4 LLMs, Agents On industry job market 2026

https://t.co/ZQHkesoUH8

Joined September 2020

Don't wanna be here? Send us removal request.

Andrew Zhao

@_AndrewZhao

6 months

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

60

342

2K

Haneul Yoo

@HaneulYoo13

1 day

🚀 Job market post I’m a final-year PhD student at @kaistpr CS, looking for #postdoc/industry research position starting Fall 2026! I develop multilingual, multicultural language models and data, inspired by how humans use and learn language. 🔗 https://t.co/LkmT7inSbH

11

52

357

Tianzhu Ye

@ytz2024

3 days

🚀 We propose Generative Adversarial Distillation (GAD) 🤖 Designed to perform on-policy distillation from proprietary black-box LLMs. ➡️ Requires neither access to teacher logits nor alignment of tokenizer vocabularies. (1/n)

5

10

20

Leo Gao

@nabla_theta

3 days

Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well.

openai.com

We trained models to think in simpler, more traceable steps—so we can better understand how they work.

15

47

407

Jiaxuan You

@youjiaxuan

3 days

As an author, reviewer, and AC, I propose adding a resource tag (e.g., industry/academic, low/high compute) to ICLR/ICML/NeurIPS submissions. In the pursuit of AGI, research resources vary drastically. Some works are minimal academic prototypes; others come from well-equipped

6

22

252

Dimitris Papailiopoulos

@DimitrisPapail

4 days

My student @Yang_Liuu is on the job market looking for an industry lab position and she's fantastic! She's done rigorous experimental work on in-context learning and identified the value of looping early on for meta learning algorithms and demystified aspects of task vectors

Dimitris Papailiopoulos

@DimitrisPapail

2 years

"Looped Transformers are Better at Learning Learning Algorithms" in ICLR @Yang_Liuu offers a simple and clean message in this paper. When it comes to emulating learning algorithms, using a looped transformer (i.e., one where the iterative structure is hardcoded) helps a lot.

3

10

67

Dimitris Papailiopoulos

@DimitrisPapail

4 days

My student @nayoung_nylee is on the job market and she's exceptional. Her work uses arithmetic as a lens to understand LLMs at a fundamental level, predicting the value of CoT, data format, self-improvement and RL (pre-GRPO!). These gave us a ton of insights before they emerged

Dimitris Papailiopoulos

@DimitrisPapail

2 years

1/ Our paper is out! Teaching Arithmetic to Small Transformers We investigate several factors that control the emergence of basic arithmetic in small transformers (e.g., nanoGPT). paper: https://t.co/ECbeypLF4q Work led by:@nayoung_nylee & @KartikSreeni Thread below.

1

16

139

Natasha Jaques

@natashajaques

5 days

I will be co-hosting this student researcher, so if you're interested in working with myself and Minsuk on MARL for LLMs, please apply!

Minsuk Chang

@minsuk_chang

10 days

I'm hiring a student researcher for next summer at the intersection of MARL x LLM. If you have a strong background and research experience in MARL algorithms, please apply and drop me an email (so that I know you've applied!) https://t.co/sikY6cgzju

6

11

173

Zhiyuan Zeng

@ZhiyuanZeng_

5 days

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

12

112

462

Andrew Zhao

@_AndrewZhao

7 days

Checkout new results from our limits-of-RLVR paper, hope we can break out that barrier together as a community 🫡

Yang Yue

@YangYue_THU

7 days

In camera-ready, we include preliminary scaling experiments using Magistral, a near-frontier pure RLVR model, and the conclusion remains consistent. I’m also curious: if we scale RLVR compute to 10–1000× of Magistral, would it actually produce new knowledge beyond pretraining?

3

5

62

Bangers

@Bangers

7 days

11.10.25

2K

662

6K

Yang Yue

@YangYue_THU

7 days

checkout our new work AdaptiveNN — an active visual reasoning framework. It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations. up to 28× lower infer cost and more human-like vision. https://t.co/h9acMWPlpE

2

23

82

Alex Chen

@chenchen

8 days

Never bet against T1

12

2

170

Shenzhi Wang🌟

@ShenzhiWang_THU

8 days

AdaptiveNN demonstrates a promising avenue towards efficient, flexible and interpretable computer vision. It also emerges as a valuable tool for investigating visual cognition. https://t.co/p9uUng9Aev

1

7

31

Yang Yue

@YangYue_THU

9 days

Thrilled that our paper received the only perfect score at NeurIPS this year. Huge thanks to my collaborators and the reviewers. See you in San Diego! https://t.co/HHTjelGU1Z https://t.co/kitS2uUX6B credit to @papercopilot

20

56

724

Andrew Zhao

@_AndrewZhao

9 days

Someone actually did it https://t.co/qcqcsDGe1r

AK

@_akhaliq

9 days

Thinking with Video Video Generation as a Promising Multimodal Reasoning Paradigm

1

2

51

Sam Rodriques

@SGRodriques

11 days

Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries

197

652

4K

John Yang

@jyangballin

11 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

29

92

368

Qihang Zhang

@QihangZhang00

12 days

Introducing our work Test-Time Steering for Lossless Text Compression via Weighted Product of Experts — a simple way to combine LLMs with traditional compressors so the ensemble is never worse than the best expert, and often better. I will be at Hall C this afternoon! (0/N)

2

7

11

Rohan Paul

@rohanpaul_ai

14 days

New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the

19

65

363

Zephyr

@zephyr_z9

15 days

https://t.co/dK3ndaGWIt

Andrej Karpathy

@karpathy

15 days

@MarFot78 @zzlccc I think if you zoomed into the paper too you’d find it just as if not more interesting.

11

70

2K