yuanhezhang6 Profile Banner
Yuanhe Zhang Profile
Yuanhe Zhang

@yuanhezhang6

Followers
28
Following
1K
Media
10
Statuses
520

Pragmatic Learning Theory, using tools from probability and Statistics | PhD in Stats @warwickstats ๐Ÿ‡ฌ๐Ÿ‡ง | MMathStat @warwickstats ๐Ÿ‡ฌ๐Ÿ‡ง

Joined September 2022
Don't wanna be here? Send us removal request.
@yuanhezhang6
Yuanhe Zhang
4 months
(1/n) ๐Ÿš€Thrill to share our LoRA-One work ( https://t.co/3MrW0e0vii) as #ICML25 ๐จ๐ซ๐š๐ฅ ๐ฉ๐ซ๐ž๐ฌ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th
1
0
6
@10m8mkW
Tom Wakayama
19 days
LLMใŒๆŒใคไปฃ่กจ็š„ใช่ƒฝๅŠ›ใงใ‚ใ‚‹ๆ–‡่„ˆๅ†…ๅญฆ็ฟ’ใซใคใ„ใฆใฎ่ซ–ๆ–‡ใ‚’ๅ…ฌ้–‹ใ—ใพใ—ใŸใ€‚ๆœฌ็ ”็ฉถใงใฏใ€TransformerใŒ ใƒปๆœ€้ฉใชใƒกใ‚ฟๅญฆ็ฟ’ๆณ•ใ‚’ไบ‹ๅ‰ๅญฆ็ฟ’ใง็ฒๅพ—ใงใใ‚‹ใ“ใจ ใƒปใใฎ็ตๆžœใ€ๆ–‡่„ˆๅ†…ๅญฆ็ฟ’ใจใ—ใฆ็Ÿญใ„ใƒ—ใƒญใƒณใƒ—ใƒˆใ‹ใ‚‰ใ‚ฟใ‚นใ‚ฏใธๅณๅบงใซ้ฉๅฟœใ—ใ€ๆœ€้ฉใชๅ›ž็ญ”ใ‚’ๅ‡บๅŠ›ใงใใ‚‹ใ“ใจ ใ‚’็คบใ—ใพใ—ใŸใ€‚ https://t.co/dA6f5qbFFI
0
64
398
@TrelisResearch
Trelis Research
10 days
- Test-time Adaptation of Tiny Recursive Models - New Paper, and the Trelis Submission Approach for the 2025 @arcprize Competition! In brief: - @jm_alexia's excellent TRM approach does not quite fit in the compute constraints of the ARC Prize competition - BUT, if you take a
7
26
202
@canondetortugas
Dylan Foster ๐Ÿข
1 month
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
9
42
247
@Dr_Singularity
Dr Singularity
9 days
This seems like a major breakthrough for AI advancement Tencent and Tsinghua introduced CALM (Continuous Autoregressive Language Models), a new approach that replaces next token prediction with continuous vector prediction, allowing the model to think in ideas instead of words.
@rryssf_
Robert Youssef
10 days
Holy shit... this might be the next big paradigm shift in AI. ๐Ÿคฏ Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the โ€œnext-tokenโ€ paradigm every LLM is built on. Instead of predicting one token at a time,
63
239
2K
@DmitryRybin1
Dmitry Rybin
10 days
Unbelievably detailed new paper from DeepMind on benchmarks and autograders they used on IMO Gold journey. For me main takeouts are: - autograding can achieve ~90% accuracy even on long and difficult reasoning - DeepThink is quite behind IMO Gold model on very difficult problems
11
66
612
@rohanpaul_ai
Rohan Paul
12 days
New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the
19
65
363
@sdand
Surya Dantuluri
14 days
What if next-token prediction wasn't a single forward pass, but a tiny optimization problem? Introducing: nanoEBM a tiny transformer that learns to think harder by doing gradient descent on its own predictions. You can start training on your Mac now - it comes < 400 lines
22
74
717
@dohmatobelvis
Elvis Dohmatob
2 months
My paper "Understanding Softmax Attention Layers: Exact Mean-Field Analysis [...]" accepted at #NeurIPS2025 @NeurIPSConf. This was a fun way to apply some stat physics techniques I learned recently, on a non-trivial problem. Infinity stones are almost complete :) Preprint soon!
5
9
177
@yueqi_song
Yueqi Song @ EMNLP2025
16 days
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
Tweet card summary image
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
172
1K
@probnstat
Probability and Statistics
16 days
This paper provides a geometric analysis of Principal Component Analysis (PCA) to determine what data properties define its excess risk. ๐Ÿ”ฌ The authors establish a central limit theorem for the principal subspace and derive the asymptotic distribution of its excess risk,
1
43
273
@huskydogewoof
Benhao Huang
18 days
@jm_alexia @ritteradam Indeed, @jm_alexia @ritteradam I also find that simply increasing the number of inference steps, even when the model is trained with only 16, can substantially improve performance. (config: TRM-MLP-EMA on Sudoku1k; though the 16-step one only reached 84% instead of 87%)
4
11
109
@Besteuler
Weiyang Liu
18 days
๐Ÿคฏ Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging. ๐Ÿš€ Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space
10
96
613
@docmilanfar
Peyman Milanfar
27 days
To establish power law behavior we need statistical tests. This paper is a nice overview of statistical methods for testing power laws "Power-Law Distributions in Empirical Data" by A Clauset, CR Shalizi, & MEJ Newman SIAM Review, 51(4), 661โ€“703 https://t.co/jlYNEXrekT 4/5
2
6
82
@Shekswess
Shekswess
21 days
Tiny Reasoning Language Model (trlm-135) - Technical Blogpostโšก Three weeks ago, I shared a weekend experiment: trlm-135, a tiny language model taught to think step-by-step. The response was incredible and now, the full technical report is live:
Tweet card summary image
shekswess.github.io
Exploring the capabilities of Tiny Language Models to reason and understand complex tasks.
11
77
537
@yuanhezhang6
Yuanhe Zhang
20 days
@jasondeanlee @CL_Theory @Fanghui_SgrA (5/n) We also construct a benchmark consisting of 2,894 gold-standard DAG-MATH formatted CoT (problems from Omni-MATH) as demonstration examples for few-shot prompting, also serving as high-quality data for SFT.
0
0
0
@yuanhezhang6
Yuanhe Zhang
20 days
@jasondeanlee @CL_Theory @Fanghui_SgrA (4/n) We propose the ๐ƒ๐€๐†-๐Œ๐€๐“๐‡ format to reveal DAG structure of CoT. With this, we evaluate three LLM families on 3 popular math benchmarks via our new metric. We found that the test models have similar perfect reasoning ability although there are large gaps in PASS@1.
1
0
0
@yuanhezhang6
Yuanhe Zhang
20 days
@jasondeanlee @CL_Theory @Fanghui_SgrA (3/n) In details, under DAG framework, we can extract a sampled DAG from each CoT trajectory. We call the CoT logically close if any intermediate sampled node has at least one child. We term logically closed CoT with correct final answer as ๐๐ž๐ซ๐Ÿ๐ž๐œ๐ญ ๐‘๐ž๐š๐ฌ๐จ๐ง๐ข๐ง๐ .
1
0
0
@yuanhezhang6
Yuanhe Zhang
20 days
@jasondeanlee @CL_Theory @Fanghui_SgrA (2/n) Building on this framework, we introduce ๐ฅ๐จ๐ ๐ข๐œ๐š๐ฅ ๐œ๐ฅ๐จ๐ฌ๐ž๐ง๐ž๐ฌ๐ฌ, a metric that quantifies how well a modelโ€™s CoT trajectory (i.e., the LLMโ€™s output solution) adheres to the DAG structure, providing evaluation beyond classical PASS@k metrics.
1
0
0