
❄️Andrew Zhao❄️
@_AndrewZhao
Followers
4K
Following
3K
Media
108
Statuses
1K
PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Research Intern @MSFTResearch, Ex. @ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On job market 26'
Joined September 2020
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
59
343
2K
RT @BanghuaZ: Beyond prompt / context engineers, we’re seeing the rise of environment engineers, experts who build high-quality RL environm….
0
8
0
RT @zzlccc: With just a few lines of code, Feng’s (@fengyao1909) suggested fix—applying importance sampling on the behavior policy—resolved….
0
53
0
RT @Gradient_HQ: Reinforcement Learning is the future tense of intelligence. Echo is how it scales. Echo is Gradient’s distributed RL fram….
0
475
0
RT @willccbb: the easiest way to get hired at @primeintellect for research is to just make it very clear that you're already doing excellen….
0
17
0
RT @kuchaev: We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data.….
0
56
0
RT @gm8xx8: NVIDIA Nemotron-Nano v2. Models: 12B Base, 9B Reasoning, 9B Base.- Arch: Hybrid Mamba2–Transformer (128K ctx, 4 attn layers).-….
0
36
0
RT @chatgpt21: It looks like Andrew Garfield will play Sam Altman in the Open AI movie coming to Amazon MGM. I think @apples_jimmy deserves….
0
9
0
RT @hwchung27: After a great time at OpenAI, we (@EdwardSun0909, @_jasonwei) recently joined @Meta Superintelligence Labs. The first month….
0
72
0
RT @DimitrisPapail: Thinking Less at test-time requires Sampling More at training-time!. GFPO is a new, cool, and simple Policy Opt algorit….
0
41
0
RT @basvanopheusden: A few weeks ago, I started a new job at @OpenAI. I wrote a document about my interview process and recommendations for….
docs.google.com
AI research interviews Bas van Opheusden, A few weeks ago, I started a new job at OpenAI. This document describes my interview process, lessons learned and advice for you. If you’re reading this, I...
0
354
0
RT @arcprize: "I've updated my AGI timeline.". One year later, @dwarkesh_sp and @fchollet meet on camera again. Both of them have shifted….
0
49
0
RT @GXiming: 🚀 How far can RL scaling take LLMs?.Drop ProRLv2! 🔥We keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5….
huggingface.co
0
25
0
RT @canondetortugas: Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025!. 📝Soliciting abstrac….
0
28
0
RT @fengyao1909: Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL?. ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜….
0
115
0
RT @zzlccc: Excited to see GDM proposing Game Arena to measure the model capabilities. Let’s also scale the environments for agent RL with….
github.com
A Gym for Generalist LLMs. Contribute to axon-rl/gem development by creating an account on GitHub.
0
4
0