ZitongYang0 Profile Banner
Zitong Yang Profile
Zitong Yang

@ZitongYang0

Followers
1K
Following
577
Media
24
Statuses
360

Continually self-improving AI

Mountain View, CA
Joined November 2018
Don't wanna be here? Send us removal request.
@ZitongYang0
Zitong Yang
3 months
📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵
9
49
248
@druv_pai
Druv Pai
9 days
Will be presenting this paper at NeurIPS 2025! 📅 Thursday, December 4, 11AM-2PM 📍 Exhibit Hall C, D, E #3703 DM me or come by in person if you want to chat about this work, or in general about representation learning, reasoning, generalization, and science of deep learning!
@druv_pai
Druv Pai
2 months
Why and how do diffusion models memorize vs generalize? Can we have scaling laws for memorization? This is increasingly relevant scientifically and pragmatically (e.g. Sora 2). 🚨 Our new preprint "On the Edge of Memorization in Diffusion Models" addresses this timely question!
4
12
87
@weijie444
Weijie Su
9 days
A new tokenizer is introduced for LLMs: https://t.co/Zuerv1jsZ4 Idea: Instead of merging tokens by frequency (BPE), optimize the tokenizer directly for maximizing average token length, yielding longer, more efficient tokens. Results: 14–18% fewer tokens, faster training &
15
70
453
@TianzheC
Tianzhe Chu
9 days
🚀 New blog! Open source model @deepseek_ai is the best as math verifier? 🔴 DeepSeek-Math V2: highest accuracy and most closely aligns with human graders when the submitted answer shows no meaningful progress. 🔵 Gemini-3-Pro: best when the solution contains partial but
2
4
6
@HaozhiQ
Haozhi Qi
14 days
I will join UChicago CS @UChicagoCS as an Assistant Professor in late 2026, and I’m recruiting PhD students in this cycle (2025 - 2026). My research focuses on AI & Robotics - including dexterous manipulation, humanoids, tactile sensing, learning from human videos, robot
26
100
639
@jyangballin
John Yang
20 days
I am advised by 🐐's
@stanfordnlp
Stanford NLP Group
22 days
How Stanford researchers design human-focused AI systems: “AI products enter the real world very quickly, often without a rigorous understanding of their impact or the consequences of their use. We need to move forward with responsibility.” —@Diyi_Yang https://t.co/wO0c8LbPsK
5
3
94
@SebastienBubeck
Sebastien Bubeck
21 days
3 years ago we could showcase AI's frontier w. a unicorn drawing. Today we do so w. AI outputs touching the scientific frontier: https://t.co/ALJvCFsaie Use the doc to judge for yourself the status of AI-aided science acceleration, and hopefully be inspired by a couple examples!
75
211
1K
@cen_sarah
Sarah Cen
1 month
In the AI ecosystem, who supplies the data? the compute? the models? We just released a new tool on the AI Supply Chain. Our dataset reveals how AI models, data, compute, capital, and even talent change hands. Here’s why you should care 👇
15
39
151
@judyhshen
Judy Shen
1 month
I DEFENDED MY PHD THIS WEEK! 🎉 So grateful for the guidance of my advisor and committee! Special thanks to my friends and family who supported me through every up and down 🥺🥰
26
25
663
@_kevinlu
Kevin Lu
1 month
in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! https://t.co/7pVk87qTDH i'm especially excited by the use of on-policy
@thinkymachines
Thinking Machines
1 month
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
13
24
323
@thinkymachines
Thinking Machines
1 month
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
63
401
3K
@Diyi_Yang
Diyi Yang
2 months
Stanford NLP 25th Anniversary🤩🤩🤩
@stanfordnlp
Stanford NLP Group
2 months
Today, we’re overjoyed to have a 25th Anniversary Reunion of @stanfordnlp. So happy to see so many of our former students back at @Stanford. And thanks to @StanfordHAI for the venue!
9
39
601
@stanfordnlp
Stanford NLP Group
2 months
More Stanford NLP Group 25th Anniversary Reunion lightning talks: …, @ZitongYang0, @EchoShao8899, @WilliamBarrHeld, @ma_tay_ (Taylor Sorensen), …
0
15
70
@simonguozirui
Simon Guo
2 months
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
11
66
289
@ZitongYang0
Zitong Yang
2 months
Neural Architecture Search is so visionary
@j_foerst
Jakob Foerster
2 months
Google brain around 2016 also was a very special place. People were pursuing a ton of diverse, exploratory and ambitious directions to push the field forward. Here's a section of @JeffDean's Google Brain "2017 Look-back", see if you can spot the transformer :) The full document
0
0
5
@Diyi_Yang
Diyi Yang
2 months
Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉
@argyros_selini
🦋/acc 🌲🎅
2 months
Its not even been a month since @thinkymachines released Tinker & Stanford already has an assignment on it
8
38
587
@johnschulman2
John Schulman
2 months
Fine-tuning APIs are becoming more powerful and widespread, but they're harder to safeguard against misuse than fixed-weight sampling APIs. Excited to share a new paper: Detecting Adversarial Fine-tuning with Auditing Agents ( https://t.co/NqMeGSCQIF). Auditing agents search
Tweet card summary image
arxiv.org
Large Language Model (LLM) providers expose fine-tuning APIs that let end users fine-tune their frontier LLMs. Unfortunately, it has been shown that an adversary with fine-tuning access to an LLM...
10
52
464
@ZitongYang0
Zitong Yang
2 months
The passing of the physicist Chen-Ning Yang ( https://t.co/LOY46RpBhz) saddens me. He has been a long-time hero and role model for me. Below is a short essay I wrote yesterday about Yang that I shared with many of my friends. I translated it into English using Gemini: ``` The
10
65
415
@druv_pai
Druv Pai
2 months
Why and how do diffusion models memorize vs generalize? Can we have scaling laws for memorization? This is increasingly relevant scientifically and pragmatically (e.g. Sora 2). 🚨 Our new preprint "On the Edge of Memorization in Diffusion Models" addresses this timely question!
5
64
371
@ChengleiSi
CLS
2 months
I’ll be at #COLM2025 this week! I’ll give a lightening talk at the Visions Workshop on 11am Friday and hang around our @lm4sci workshop! DM me if you wanna chat. We have some exciting ongoing projects on automating post-/pre-training research.
1
5
35
@druv_pai
Druv Pai
2 months
🚨 We wrote a new AI textbook "Learning Deep Representations of Data Distributions"! TL;DR: We develop principles for representation learning in large scale deep neural networks, show that they underpin existing methods, and build new principled methods.
6
66
298