Yuanhe Zhang
@yuanhezhang6
Followers
28
Following
1K
Media
10
Statuses
520
Pragmatic Learning Theory, using tools from probability and Statistics | PhD in Stats @warwickstats ๐ฌ๐ง | MMathStat @warwickstats ๐ฌ๐ง
Joined September 2022
(1/n) ๐Thrill to share our LoRA-One work ( https://t.co/3MrW0e0vii) as #ICML25 ๐จ๐ซ๐๐ฅ ๐ฉ๐ซ๐๐ฌ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th
1
0
6
LLMใๆใคไปฃ่กจ็ใช่ฝๅใงใใๆ่ๅ
ๅญฆ็ฟใซใคใใฆใฎ่ซๆใๅ
ฌ้ใใพใใใๆฌ็ ็ฉถใงใฏใTransformerใ ใปๆ้ฉใชใกใฟๅญฆ็ฟๆณใไบๅๅญฆ็ฟใง็ฒๅพใงใใใใจ ใปใใฎ็ตๆใๆ่ๅ
ๅญฆ็ฟใจใใฆ็ญใใใญใณใใใใใฟในใฏใธๅณๅบงใซ้ฉๅฟใใๆ้ฉใชๅ็ญใๅบๅใงใใใใจ ใ็คบใใพใใใ https://t.co/dA6f5qbFFI
0
64
398
Towards Formalizing Reinforcement Learning Theory.
arxiv.org
In this paper, we formalize the almost sure convergence of $Q$-learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib...
0
4
31
- Test-time Adaptation of Tiny Recursive Models - New Paper, and the Trelis Submission Approach for the 2025 @arcprize Competition! In brief: - @jm_alexia's excellent TRM approach does not quite fit in the compute constraints of the ARC Prize competition - BUT, if you take a
7
26
202
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
9
42
247
This seems like a major breakthrough for AI advancement Tencent and Tsinghua introduced CALM (Continuous Autoregressive Language Models), a new approach that replaces next token prediction with continuous vector prediction, allowing the model to think in ideas instead of words.
Holy shit... this might be the next big paradigm shift in AI. ๐คฏ Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the โnext-tokenโ paradigm every LLM is built on. Instead of predicting one token at a time,
63
239
2K
Unbelievably detailed new paper from DeepMind on benchmarks and autograders they used on IMO Gold journey. For me main takeouts are: - autograding can achieve ~90% accuracy even on long and difficult reasoning - DeepThink is quite behind IMO Gold model on very difficult problems
11
66
612
New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the
19
65
363
What if next-token prediction wasn't a single forward pass, but a tiny optimization problem? Introducing: nanoEBM a tiny transformer that learns to think harder by doing gradient descent on its own predictions. You can start training on your Mac now - it comes < 400 lines
22
74
717
My paper "Understanding Softmax Attention Layers: Exact Mean-Field Analysis [...]" accepted at #NeurIPS2025 @NeurIPSConf. This was a fun way to apply some stat physics techniques I learned recently, on a non-trivial problem. Infinity stones are almost complete :) Preprint soon!
5
9
177
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
172
1K
This paper provides a geometric analysis of Principal Component Analysis (PCA) to determine what data properties define its excess risk. ๐ฌ The authors establish a central limit theorem for the principal subspace and derive the asymptotic distribution of its excess risk,
1
43
273
@jm_alexia @ritteradam Indeed, @jm_alexia @ritteradam I also find that simply increasing the number of inference steps, even when the model is trained with only 16, can substantially improve performance. (config: TRM-MLP-EMA on Sudoku1k; though the 16-step one only reached 84% instead of 87%)
4
11
109
๐คฏ Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging. ๐ Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space
10
96
613
To establish power law behavior we need statistical tests. This paper is a nice overview of statistical methods for testing power laws "Power-Law Distributions in Empirical Data" by A Clauset, CR Shalizi, & MEJ Newman SIAM Review, 51(4), 661โ703 https://t.co/jlYNEXrekT 4/5
2
6
82
Tiny Reasoning Language Model (trlm-135) - Technical Blogpostโก Three weeks ago, I shared a weekend experiment: trlm-135, a tiny language model taught to think step-by-step. The response was incredible and now, the full technical report is live:
shekswess.github.io
Exploring the capabilities of Tiny Language Models to reason and understand complex tasks.
11
77
537
@jasondeanlee @CL_Theory @Fanghui_SgrA (5/n) We also construct a benchmark consisting of 2,894 gold-standard DAG-MATH formatted CoT (problems from Omni-MATH) as demonstration examples for few-shot prompting, also serving as high-quality data for SFT.
0
0
0
@jasondeanlee @CL_Theory @Fanghui_SgrA (4/n) We propose the ๐๐๐-๐๐๐๐ format to reveal DAG structure of CoT. With this, we evaluate three LLM families on 3 popular math benchmarks via our new metric. We found that the test models have similar perfect reasoning ability although there are large gaps in PASS@1.
1
0
0
@jasondeanlee @CL_Theory @Fanghui_SgrA (3/n) In details, under DAG framework, we can extract a sampled DAG from each CoT trajectory. We call the CoT logically close if any intermediate sampled node has at least one child. We term logically closed CoT with correct final answer as ๐๐๐ซ๐๐๐๐ญ ๐๐๐๐ฌ๐จ๐ง๐ข๐ง๐ .
1
0
0
@jasondeanlee @CL_Theory @Fanghui_SgrA (2/n) Building on this framework, we introduce ๐ฅ๐จ๐ ๐ข๐๐๐ฅ ๐๐ฅ๐จ๐ฌ๐๐ง๐๐ฌ๐ฌ, a metric that quantifies how well a modelโs CoT trajectory (i.e., the LLMโs output solution) adheres to the DAG structure, providing evaluation beyond classical PASS@k metrics.
1
0
0