Yuanhe Zhang @yuanhezhang6 X Profile

Yuanhe Zhang

@yuanhezhang6

Followers

28

Following

1K

Media

10

Statuses

520

Pragmatic Learning Theory, using tools from probability and Statistics | PhD in Stats @warwickstats 🇬🇧 | MMathStat @warwickstats 🇬🇧

https://t.co/041ap8k1hC

Joined September 2022

Don't wanna be here? Send us removal request.

Yuanhe Zhang

@yuanhezhang6

4 months

(1/n) 🚀Thrill to share our LoRA-One work ( https://t.co/3MrW0e0vii) as #ICML25 𝐨𝐫𝐚𝐥 𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧, w. Fanghui @Fanghui_SgrA (Warwick) and Yudong (Madison). Oral @ West Ballroom B, 4pm at July 17th Poster @ West Exhibition Hall B2-B3 #W 905, 4:30PM at July 15th

1

0

6

Tom Wakayama

@10m8mkW

19 days

LLMが持つ代表的な能力である文脈内学習についての論文を公開しました。本研究では、Transformerが・最適なメタ学習法を事前学習で獲得できること・その結果、文脈内学習として短いプロンプトからタスクへ即座に適応し、最適な回答を出力できることを示しました。 https://t.co/dA6f5qbFFI

0

64

398

Statistics Papers

@StatsPapers

7 days

Towards Formalizing Reinforcement Learning Theory.

arxiv.org

In this paper, we formalize the almost sure convergence of $Q$-learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib...

0

4

31

Trelis Research

@TrelisResearch

10 days

- Test-time Adaptation of Tiny Recursive Models - New Paper, and the Trelis Submission Approach for the 2025 @arcprize Competition! In brief: - @jm_alexia's excellent TRM approach does not quite fit in the compute constraints of the ARC Prize competition - BUT, if you take a

7

26

202

Dylan Foster 🐢

@canondetortugas

1 month

New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.

9

42

247

Dr Singularity

@Dr_Singularity

9 days

This seems like a major breakthrough for AI advancement Tencent and Tsinghua introduced CALM (Continuous Autoregressive Language Models), a new approach that replaces next token prediction with continuous vector prediction, allowing the model to think in ideas instead of words.

Robert Youssef

@rryssf_

10 days

Holy shit... this might be the next big paradigm shift in AI. 🤯 Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the “next-token” paradigm every LLM is built on. Instead of predicting one token at a time,

63

239

2K

Dmitry Rybin

@DmitryRybin1

10 days

Unbelievably detailed new paper from DeepMind on benchmarks and autograders they used on IMO Gold journey. For me main takeouts are: - autograding can achieve ~90% accuracy even on long and difficult reasoning - DeepThink is quite behind IMO Gold model on very difficult problems

11

66

612

Rohan Paul

@rohanpaul_ai

12 days

New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the

19

65

363

Pedro Domingos

@pmddomingos

30 days

I've found the path to AGI: https://t.co/NRvZViBVUc

arxiv.org

Progress in AI is hindered by the lack of a programming language with all the requisite features. Libraries like PyTorch and TensorFlow provide automatic differentiation and efficient GPU...

138

216

2K

Surya Dantuluri

@sdand

14 days

What if next-token prediction wasn't a single forward pass, but a tiny optimization problem? Introducing: nanoEBM a tiny transformer that learns to think harder by doing gradient descent on its own predictions. You can start training on your Mac now - it comes < 400 lines

22

74

717

Elvis Dohmatob

@dohmatobelvis

2 months

My paper "Understanding Softmax Attention Layers: Exact Mean-Field Analysis [...]" accepted at #NeurIPS2025 @NeurIPSConf. This was a fun way to apply some stat physics techniques I learned recently, on a non-trivial problem. Infinity stones are almost complete :) Preprint soon!

5

9

177

Yueqi Song @ EMNLP2025

@yueqi_song

16 days

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.

arxiv.org

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...

27

172

1K

Probability and Statistics

@probnstat

16 days

This paper provides a geometric analysis of Principal Component Analysis (PCA) to determine what data properties define its excess risk. 🔬 The authors establish a central limit theorem for the principal subspace and derive the asymptotic distribution of its excess risk,

1

43

273

Benhao Huang

@huskydogewoof

18 days

@jm_alexia @ritteradam Indeed, @jm_alexia @ritteradam I also find that simply increasing the number of inference steps, even when the model is trained with only 16, can substantially improve performance. (config: TRM-MLP-EMA on Sudoku1k; though the 16-step one only reached 84% instead of 87%)

4

11

109

Weiyang Liu

@Besteuler

18 days

🤯 Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging. 🚀 Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space

10

96

613

Peyman Milanfar

@docmilanfar

27 days

To establish power law behavior we need statistical tests. This paper is a nice overview of statistical methods for testing power laws "Power-Law Distributions in Empirical Data" by A Clauset, CR Shalizi, & MEJ Newman SIAM Review, 51(4), 661–703 https://t.co/jlYNEXrekT 4/5

2

6

82

Shekswess

@Shekswess

21 days

Tiny Reasoning Language Model (trlm-135) - Technical Blogpost⚡ Three weeks ago, I shared a weekend experiment: trlm-135, a tiny language model taught to think step-by-step. The response was incredible and now, the full technical report is live:

shekswess.github.io

Exploring the capabilities of Tiny Language Models to reason and understand complex tasks.

11

77

537

Yuanhe Zhang

@yuanhezhang6

20 days

@jasondeanlee @CL_Theory @Fanghui_SgrA (5/n) We also construct a benchmark consisting of 2,894 gold-standard DAG-MATH formatted CoT (problems from Omni-MATH) as demonstration examples for few-shot prompting, also serving as high-quality data for SFT.

0

Yuanhe Zhang

@yuanhezhang6

20 days

@jasondeanlee @CL_Theory @Fanghui_SgrA (4/n) We propose the 𝐃𝐀𝐆-𝐌𝐀𝐓𝐇 format to reveal DAG structure of CoT. With this, we evaluate three LLM families on 3 popular math benchmarks via our new metric. We found that the test models have similar perfect reasoning ability although there are large gaps in PASS@1.

1

0

Yuanhe Zhang

@yuanhezhang6

20 days

@jasondeanlee @CL_Theory @Fanghui_SgrA (3/n) In details, under DAG framework, we can extract a sampled DAG from each CoT trajectory. We call the CoT logically close if any intermediate sampled node has at least one child. We term logically closed CoT with correct final answer as 𝐏𝐞𝐫𝐟𝐞𝐜𝐭 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠.

1

0

Yuanhe Zhang

@yuanhezhang6

20 days

@jasondeanlee @CL_Theory @Fanghui_SgrA (2/n) Building on this framework, we introduce 𝐥𝐨𝐠𝐢𝐜𝐚𝐥 𝐜𝐥𝐨𝐬𝐞𝐧𝐞𝐬𝐬, a metric that quantifies how well a model’s CoT trajectory (i.e., the LLM’s output solution) adheres to the DAG structure, providing evaluation beyond classical PASS@k metrics.

1

0