ZitongYang0 Profile Banner
Zitong Yang Profile
Zitong Yang

@ZitongYang0

Followers
867
Following
542
Media
23
Statuses
342

Continually self-improving AI

Stanford, CA
Joined November 2018
Don't wanna be here? Send us removal request.
@ZitongYang0
Zitong Yang
19 days
📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵
9
50
246
@druv_pai
Druv Pai
4 days
Why and how do diffusion models memorize vs generalize? Can we have scaling laws for memorization? This is increasingly relevant scientifically and pragmatically (e.g. Sora 2). 🚨 Our new preprint "On the Edge of Memorization in Diffusion Models" addresses this timely question!
4
58
335
@ChengleiSi
CLS @COLM2025
4 days
I’ll be at #COLM2025 this week! I’ll give a lightening talk at the Visions Workshop on 11am Friday and hang around our @lm4sci workshop! DM me if you wanna chat. We have some exciting ongoing projects on automating post-/pre-training research.
1
5
34
@druv_pai
Druv Pai
5 days
🚨 We wrote a new AI textbook "Learning Deep Representations of Data Distributions"! TL;DR: We develop principles for representation learning in large scale deep neural networks, show that they underpin existing methods, and build new principled methods.
4
29
124
@ChengleiSi
CLS @COLM2025
9 days
the feeling when you spent two months building the training infra and finally got the first experiment running 🥹
1
1
41
@_sdbuchanan
Sam Buchanan
9 days
We wrote a book about representation learning! It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms. 👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions
13
203
1K
@thinkymachines
Thinking Machines
10 days
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
221
766
6K
@ZhongRuiqi
Ruiqi Zhong
10 days
Very excited about this release!! As a former grad student I struggled to finetune llms. Even when the gpus are enough, it was painful to set up the infra correctly. Tinker allows more researchers to understand and language models, beyond a few well-funded labs.
@thinkymachines
Thinking Machines
10 days
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
2
10
199
@BerkeleyPhysics
Berkeley Physics
11 days
Nobel laureate George Smoot, UC Berkeley physicist whose work with satellite experiments confirmed the Big Bang theory, has died at 80. https://t.co/Jx2Hks3PMJ
4
8
15
@thinkymachines
Thinking Machines
12 days
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
81
555
3K
@amspector100
Asher Spector
13 days
check out what @bfspector worked on this summer! (he has not seen the sky for months but now he's free)
@bfspector
Benjamin F Spector
13 days
(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.
0
2
28
@TransluceAI
Transluce
16 days
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
@TransluceAI
Transluce
2 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
70
@zjasper666
Jasper
16 days
Can AI really do math? 🤔 We analyzed math ability across 12 core skills like creativity, abstraction, reasoning & more. This is the way, to measure progress toward Math AGI.
@YiMaTweets
Yi Ma
17 days
Our Gauss report is now on the arxiv: https://t.co/3iFk2yaeUf Does the current LLM models solve math problems with memorisation or understanding? Can it truly grasp abstract concepts or simply exploit correlations through compression? That is THE next trillion-dollar question.
4
2
17
@YiMaTweets
Yi Ma
17 days
Our Gauss report is now on the arxiv: https://t.co/3iFk2yaeUf Does the current LLM models solve math problems with memorisation or understanding? Can it truly grasp abstract concepts or simply exploit correlations through compression? That is THE next trillion-dollar question.
Tweet card summary image
arxiv.org
We introduce \textbf{GAUSS} (\textbf{G}eneral \textbf{A}ssessment of \textbf{U}nderlying \textbf{S}tructured \textbf{S}kills in Mathematics), a benchmark that evaluates LLMs' mathematical...
4
13
55
@ericzelikman
Eric Zelikman
18 days
some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!
50
26
570
@ChengleiSi
CLS @COLM2025
17 days
while we are on this, rmb we also had: - Neural Architecture Search with Reinforcement Learning https://t.co/qvGwkX41VE - Symbolic Discovery of Optimization Algorithms https://t.co/lJzotdjyOM - Using Large Language Models for Hyperparameter Optimization https://t.co/WDQQoX7cc7 -
Tweet card summary image
arxiv.org
This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning...
@_fracapuano
Francesco Capuano
19 days
stop designing your RL algorithms
4
11
43
@lyttonhao
lyttonhao
18 days
Excited to share Manzano from AFM team—a simple, scalable unified multimodal model for understanding and generation. Manzano shows minimal task conflict, promising scaling behavior and state-of-the-art results among unified models. Paper link: https://t.co/HpziryrvSc
1
8
16
@jiqizhixin
机器之心 JIQIZHIXIN
19 days
Huge potential! Apple and Stanford have just released Synthetic Bootstrapped Pretraining (SBP). Standard LM pretraining = token correlations in one doc. SBP = learns inter-document relations → synthesizes a huge new corpus for joint training. ✨ Pretrained 3B model on 1T
4
25
137
@ZitongYang0
Zitong Yang
19 days
Enjoyed learning from world-class embedding expert @HongLiu9903. I think document embedding offers a new venue of under-exploited self-supervision because they arrange related documents together. Much like how internet arranged related tokens together.
@HongLiu9903
Hong Liu
19 days
🚀 Unveiling the first synthetic pretraining method that doesn’t rely on teacher distillation. Big shoutout to @ZitongYang0 @Aonan12 and the team!
0
0
6
@rudzinskimaciej
Rudzinski Maciej
19 days
I'm guessing that Qwen Max was trained this way as only that could explin some of it's capabilities (and size and high quality data and long pretraining ;) ) This is a only sensible approach due to "data density" prolem in modeling for the ones that read my post
@ZitongYang0
Zitong Yang
19 days
📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵
1
1
1
@ZitongYang0
Zitong Yang
19 days
Feeling inspired by @ChengleiSi at every AGI hackathon
@ChengleiSi
CLS @COLM2025
19 days
always feeling inspired by @ZitongYang0
0
0
3