
Albert Tseng
@tsengalb99
Followers
697
Following
88
Media
18
Statuses
83
Excited to announce our #AISTATS📜on training LLMs with MXFP4! We use stoch. rounding and random Hadamard transforms (all fast on HW) to get low-variance, unbiased gradient estimates with MXFP4 GEMMs. We get a ~30% speedup over FP8 with almost no PPL gap!.
1
8
22
RT @yingheng_wang: ❓ Are LLMs actually problem solvers or just good at regurgitating facts?. 🚨New Benchmark Alert! We built HeuriGym to ben….
0
25
0
RT @ellisk_kellis: New paper: World models + Program synthesis by @topwasu.1. World modeling on-the-fly by synthesizing programs w/ 4000+ l….
0
105
0
RT @justachetan: I will be at #CVPR2025 presenting our work on differential operators for hybrid neural fields! Catch me at our poster:. 🗓️….
0
4
0
RT @simran_s_arora: Checkout CARTRIDGES, scaling cache-time compute! An alternative to ICL for settings where many different user messages….
0
4
0
RT @EyubogluSabri: When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a….
0
70
0
RT @tri_dao: Albert and co continue to do excellent work on quantization. This time the trick is to minimize KL wrt the original model, wit….
0
19
0
Apparently I chose the worst day to release a paper, so ICYMI, we made a post-training quantization algorithm that outperforms even @Google's quantization-aware training recipe. We beat the prior SOTA by >30%, meaning faster and smaller models. More details in the original 🧵👇
📣Introducing our latest work: Yet Another Quantization Algorithm!. YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯. �
2
1
20
RT @togethercompute: 5/ Quantized models don't need to lose fidelity. Check out our paper and blog for details:. 📝 Paper: .
0
2
0
RT @austinsilveria: chipmunk is up on arxiv!. across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention….
0
7
0
RT @togethercompute: 🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit đźĄ). Led by @tsengalb99,….
0
5
0
@chrismdesa (6/6) We also have a blog post ( with @togethercompute, who graciously provided compute resources for this project!.
1
1
6
(5/6) All this results in a lower KL and SOTA downstream performance across a wide range of models and quantizers. For more information, check out our (w/Zhaofeng Sun & @chrismdesa) paper ( and code (.
1
1
9