❄️Andrew Zhao❄️ Profile
❄️Andrew Zhao❄️

@_AndrewZhao

Followers
4K
Following
3K
Media
108
Statuses
1K

PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Research Intern @MSFTResearch, Ex. @ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On job market 26'

Joined September 2020
Don't wanna be here? Send us removal request.
@_AndrewZhao
❄️Andrew Zhao❄️
4 months
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
Tweet media one
59
343
2K
@_AndrewZhao
❄️Andrew Zhao❄️
21 hours
RT @BanghuaZ: Beyond prompt / context engineers, we’re seeing the rise of environment engineers, experts who build high-quality RL environm….
0
8
0
@_AndrewZhao
❄️Andrew Zhao❄️
2 days
RT @zzlccc: With just a few lines of code, Feng’s (@fengyao1909) suggested fix—applying importance sampling on the behavior policy—resolved….
0
53
0
@_AndrewZhao
❄️Andrew Zhao❄️
4 days
RT @Gradient_HQ: Reinforcement Learning is the future tense of intelligence. Echo is how it scales. Echo is Gradient’s distributed RL fram….
0
475
0
@_AndrewZhao
❄️Andrew Zhao❄️
5 days
RT @willccbb: the easiest way to get hired at @primeintellect for research is to just make it very clear that you're already doing excellen….
0
17
0
@_AndrewZhao
❄️Andrew Zhao❄️
6 days
RT @kuchaev: We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data.….
0
56
0
@_AndrewZhao
❄️Andrew Zhao❄️
6 days
RT @gm8xx8: NVIDIA Nemotron-Nano v2. Models: 12B Base, 9B Reasoning, 9B Base.- Arch: Hybrid Mamba2–Transformer (128K ctx, 4 attn layers).-….
0
36
0
@_AndrewZhao
❄️Andrew Zhao❄️
7 days
RT @chatgpt21: It looks like Andrew Garfield will play Sam Altman in the Open AI movie coming to Amazon MGM. I think @apples_jimmy deserves….
0
9
0
@_AndrewZhao
❄️Andrew Zhao❄️
8 days
RT @gensynai: Coming soon
Tweet media one
0
81
0
@_AndrewZhao
❄️Andrew Zhao❄️
9 days
LLMs as internet/knowledge base, no need for external tools. Reminiscent of older work from AI2/UW, Rainer and CRYSTAL
Tweet media one
7
54
318
@_AndrewZhao
❄️Andrew Zhao❄️
10 days
RT @hwchung27: After a great time at OpenAI, we (@EdwardSun0909, @_jasonwei) recently joined @Meta Superintelligence Labs. The first month….
0
72
0
@_AndrewZhao
❄️Andrew Zhao❄️
11 days
RT @DimitrisPapail: Thinking Less at test-time requires Sampling More at training-time!. GFPO is a new, cool, and simple Policy Opt algorit….
0
41
0
@_AndrewZhao
❄️Andrew Zhao❄️
12 days
RT @arcprize: "I've updated my AGI timeline.". One year later, @dwarkesh_sp and @fchollet meet on camera again. Both of them have shifted….
0
49
0
@_AndrewZhao
❄️Andrew Zhao❄️
12 days
Nice empirical paper investigating all your bag of tricks in reasoning LLMs.
Tweet media one
4
94
616
@_AndrewZhao
❄️Andrew Zhao❄️
13 days
RT @GXiming: 🚀 How far can RL scaling take LLMs?.Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5….
Tweet card summary image
huggingface.co
0
25
0
@_AndrewZhao
❄️Andrew Zhao❄️
13 days
RT @canondetortugas: Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025!. 📝Soliciting abstrac….
0
28
0
@_AndrewZhao
❄️Andrew Zhao❄️
15 days
0
48
0
@_AndrewZhao
❄️Andrew Zhao❄️
17 days
RT @interconnectsai: GPT 5 Launch Party w/ Will Brown & Swyx
0
17
0
@_AndrewZhao
❄️Andrew Zhao❄️
19 days
RT @fengyao1909: Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL?. ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜….
0
115
0
@_AndrewZhao
❄️Andrew Zhao❄️
19 days
RT @zzlccc: Excited to see GDM proposing Game Arena to measure the model capabilities. Let’s also scale the environments for agent RL with….
Tweet card summary image
github.com
A Gym for Generalist LLMs. Contribute to axon-rl/gem development by creating an account on GitHub.
0
4
0