RuntianZhai Profile Banner
Runtian Zhai Profile
Runtian Zhai

@RuntianZhai

Followers
526
Following
209
Media
8
Statuses
198

PhD @SCSatCMU. I study representation learning, and machine learning theory and algorithms.

Pittsburgh, PA, USA
Joined April 2017
Don't wanna be here? Send us removal request.
@RuntianZhai
Runtian Zhai
6 months
Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: https://t.co/MCIJifkU1Z Paper: https://t.co/RXVF7n7mHR đź§µ1/12
Tweet card summary image
arxiv.org
This dissertation establishes the contexture theory to mathematically characterize the mechanism of representation learning, or pretraining. Despite the remarkable empirical success of foundation...
2
31
163
@A_v_i__S
Avi Schwarzschild
5 months
Big news! 🎉 I’m joining UNC-Chapel Hill as an Assistant Professor in Computer Science starting next year! Before that, I’ll be spending time @OpenAI working on LLM privacy. @unccs @uncnlp
46
35
575
@ZhengyangGeng
Zhengyang Geng
5 months
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,
5
39
153
@yidingjiang
Yiding Jiang
5 months
Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with @AllanZhou17 ) about this neat idea that motivated ADO but didn’t make it into the paper. https://t.co/kkLyZN2CF7
yidingjiang.github.io
We describe a unified framework for data selection and curriculum learning via compression.
2
18
106
@MOSS_workshop
MOSS
6 months
Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025! 🔗Website: https://t.co/lZdKPrw4Pt 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training
0
15
44
@RuntianZhai
Runtian Zhai
6 months
A shorter version of the first three chapters of my thesis is accepted by ICML 2025. It provides a quick start for those interested in learning about the contexture theory. Check it out:
Tweet card summary image
arxiv.org
Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory....
@RuntianZhai
Runtian Zhai
6 months
Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: https://t.co/MCIJifkU1Z Paper: https://t.co/RXVF7n7mHR đź§µ1/12
1
2
37
@rpukdeee
Rattana Pukdee
6 months
In our #AISTATS2025 paper, we ask: when it is possible to recover a consistent joint distribution from conditionals? We propose path consistency and autoregressive path consistency—necessary and easily verifiable conditions. See you at Poster session 3, Monday 5th May.
1
7
15
@electronickale
Yutong (Kelly) He
6 months
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
3
31
84
@RuntianZhai
Runtian Zhai
6 months
In sum, representations are learned from association between X & A, by extracting the top-d eigenspace of operator T. Better contexts are crucial, and a good one has moderate association. Building this theory was an exciting journey and I’m grateful to all my collaborators. 12/12
0
0
1
@RuntianZhai
Runtian Zhai
6 months
My theory shows that representation learning can do all types of system-1 (associative) thinking, once we define X and A and have enough data. It cannot do system-2 (analytic) thinking, which is why reasoning is so hard. I plan to develop theory for system-2 in the future. 11/12
1
0
3
@RuntianZhai
Runtian Zhai
6 months
Convex combination optimizes a linear combination of multiple objectives; it balances weak & strong associations. Concatenation combines the embeddings of multiple models, like feature engineering where we concatenate multiple features, and it strengthens the association. 10/12
1
0
1
@RuntianZhai
Runtian Zhai
6 months
If we have contexts that are too strong/weak, what can we do? We can mix multiple contexts to balance their associations! I define 3 base operations: Convolution, convex combination, concatenation. Convolution is like composing data augmentations; it weakens the association. 9/12
1
0
1
@RuntianZhai
Runtian Zhai
6 months
This association level controls the decay rate of the singular values: Weaker association leads to faster decay. If too fast, few tasks are compatible and the model won’t be transferable; if too slow, a larger embedding dimension is needed, causing higher sample complexity. 8/12
1
0
1
@RuntianZhai
Runtian Zhai
6 months
Hence, we need better contexts. I believe we can get them, so progress in pretraining won’t end. First, we must understand which contexts are good. I show that a good context has a moderate association between X & A. For example, BERT is best when the mask ratio is moderate. 7/12
1
0
1
@RuntianZhai
Runtian Zhai
6 months
The important implication is that for a fixed pretrain context, *scaling up the model size inevitably produces diminishing returns*, since the representation will converge to the contexture (the span of top-d singular functions). Upon convergence, further scaling has no use. 6/12
1
0
2
@RuntianZhai
Runtian Zhai
6 months
Transferability results from compatibility between pretrain context & downstream task, meaning that the context is useful for learning the task. I mathematically formulate compatibility, and show that contexture minimizes the worst error on compatible tasks, so it’s optimal. 5/12
1
0
3
@RuntianZhai
Runtian Zhai
6 months
This perspective works for many paradigms: supervised/(non-)contrastive learning, denoising autoencoders, generative models, graph node embedding, etc. This d-dim space can be obtained by training big models to optimize certain objectives, instead of non-scalable kernel PCA. 4/12
1
0
2
@RuntianZhai
Runtian Zhai
6 months
The association is given by their joint distribution; its marginals give two L2 function spaces. The expectation operator T maps g(a) to E[g(A)|x]. I show that repre. learning extracts the linear span of the top-d singular functions of T. I call this learning the contexture. 3/12
1
0
1
@RuntianZhai
Runtian Zhai
6 months
It isn’t theoretically clear what representation is learned by a foundation model. My theory shows that a representation is learned from the association between the input X and a context variable A. Example: A can be label of X, the first k tokens of X, or a crop of image X. 2/12
1
0
2
@shaohua0116
Shao-Hua Sun
6 months
I guess our lab does not even have an academic budget then … #ICLR2025 keynote talk by Danqi Chen
1
9
130