Zitong Yang @ZitongYang0 X Profile

Zitong Yang

@ZitongYang0

Followers

867

Following

542

Media

23

Statuses

342

Continually self-improving AI

https://t.co/v67hNlkTJO

Stanford, CA

Joined November 2018

Don't wanna be here? Send us removal request.

Zitong Yang

@ZitongYang0

19 days

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

9

50

246

Druv Pai

@druv_pai

4 days

Why and how do diffusion models memorize vs generalize? Can we have scaling laws for memorization? This is increasingly relevant scientifically and pragmatically (e.g. Sora 2). 🚨 Our new preprint "On the Edge of Memorization in Diffusion Models" addresses this timely question!

4

58

335

CLS @COLM2025

@ChengleiSi

4 days

I’ll be at #COLM2025 this week! I’ll give a lightening talk at the Visions Workshop on 11am Friday and hang around our @lm4sci workshop! DM me if you wanna chat. We have some exciting ongoing projects on automating post-/pre-training research.

1

5

34

Druv Pai

@druv_pai

5 days

🚨 We wrote a new AI textbook "Learning Deep Representations of Data Distributions"! TL;DR: We develop principles for representation learning in large scale deep neural networks, show that they underpin existing methods, and build new principled methods.

4

29

124

CLS @COLM2025

@ChengleiSi

9 days

the feeling when you spent two months building the training infra and finally got the first experiment running 🥹

1

41

Sam Buchanan

@_sdbuchanan

9 days

We wrote a book about representation learning! It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms. 👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions

13

203

1K

Thinking Machines

@thinkymachines

10 days

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

221

766

6K

Ruiqi Zhong

@ZhongRuiqi

10 days

Very excited about this release!! As a former grad student I struggled to finetune llms. Even when the gpus are enough, it was painful to set up the infra correctly. Tinker allows more researchers to understand and language models, beyond a few well-funded labs.

Thinking Machines

@thinkymachines

10 days

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

2

10

199

Berkeley Physics

@BerkeleyPhysics

11 days

Nobel laureate George Smoot, UC Berkeley physicist whose work with satellite experiments confirmed the Big Bang theory, has died at 80. https://t.co/Jx2Hks3PMJ

4

8

15

Thinking Machines

@thinkymachines

12 days

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.

81

555

3K

Asher Spector

@amspector100

13 days

check out what @bfspector worked on this summer! (he has not seen the sky for months but now he's free)

Benjamin F Spector

@bfspector

13 days

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.

0

2

28

Transluce

@TransluceAI

16 days

We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.

Transluce

@TransluceAI

2 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

1

13

70

Jasper

@zjasper666

16 days

Can AI really do math? 🤔 We analyzed math ability across 12 core skills like creativity, abstraction, reasoning & more. This is the way, to measure progress toward Math AGI.

Yi Ma

@YiMaTweets

17 days

Our Gauss report is now on the arxiv: https://t.co/3iFk2yaeUf Does the current LLM models solve math problems with memorisation or understanding? Can it truly grasp abstract concepts or simply exploit correlations through compression? That is THE next trillion-dollar question.

4

2

17

Yi Ma

@YiMaTweets

17 days

Our Gauss report is now on the arxiv: https://t.co/3iFk2yaeUf Does the current LLM models solve math problems with memorisation or understanding? Can it truly grasp abstract concepts or simply exploit correlations through compression? That is THE next trillion-dollar question.

arxiv.org

We introduce \textbf{GAUSS} (\textbf{G}eneral \textbf{A}ssessment of \textbf{U}nderlying \textbf{S}tructured \textbf{S}kills in Mathematics), a benchmark that evaluates LLMs' mathematical...

4

13

55

Eric Zelikman

@ericzelikman

18 days

some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!

50

26

570

CLS @COLM2025

@ChengleiSi

17 days

while we are on this, rmb we also had: - Neural Architecture Search with Reinforcement Learning https://t.co/qvGwkX41VE - Symbolic Discovery of Optimization Algorithms https://t.co/lJzotdjyOM - Using Large Language Models for Hyperparameter Optimization https://t.co/WDQQoX7cc7 -

arxiv.org

This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning...

$@_fracapuano$

Francesco Capuano

@_fracapuano

19 days

stop designing your RL algorithms

4

11

43

lyttonhao

@lyttonhao

18 days

Excited to share Manzano from AFM team—a simple, scalable unified multimodal model for understanding and generation. Manzano shows minimal task conflict, promising scaling behavior and state-of-the-art results among unified models. Paper link: https://t.co/HpziryrvSc

1

8

16

机器之心 JIQIZHIXIN

@jiqizhixin

19 days

Huge potential! Apple and Stanford have just released Synthetic Bootstrapped Pretraining (SBP). Standard LM pretraining = token correlations in one doc. SBP = learns inter-document relations → synthesizes a huge new corpus for joint training. ✨ Pretrained 3B model on 1T

4

25

137

Zitong Yang

@ZitongYang0

19 days

Enjoyed learning from world-class embedding expert @HongLiu9903. I think document embedding offers a new venue of under-exploited self-supervision because they arrange related documents together. Much like how internet arranged related tokens together.

Hong Liu

@HongLiu9903

19 days

🚀 Unveiling the first synthetic pretraining method that doesn’t rely on teacher distillation. Big shoutout to @ZitongYang0 @Aonan12 and the team!

0

6

Rudzinski Maciej

@rudzinskimaciej

19 days

I'm guessing that Qwen Max was trained this way as only that could explin some of it's capabilities (and size and high quality data and long pretraining ;) ) This is a only sensible approach due to "data density" prolem in modeling for the ones that read my post

Zitong Yang

@ZitongYang0

19 days

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

1

Zitong Yang

@ZitongYang0

19 days

Feeling inspired by @ChengleiSi at every AGI hackathon

CLS @COLM2025

@ChengleiSi

19 days

always feeling inspired by @ZitongYang0

0

3