Albert Tseng Profile
Albert Tseng

@tsengalb99

Followers
697
Following
88
Media
18
Statuses
83

CS PhD Student @ Cornell

Joined June 2022
Don't wanna be here? Send us removal request.
@tsengalb99
Albert Tseng
5 months
Excited to announce our #AISTATS📜on training LLMs with MXFP4! We use stoch. rounding and random Hadamard transforms (all fast on HW) to get low-variance, unbiased gradient estimates with MXFP4 GEMMs. We get a ~30% speedup over FP8 with almost no PPL gap!.
Tweet media one
Tweet media two
1
8
22
@tsengalb99
Albert Tseng
25 days
RT @yingheng_wang: ❓ Are LLMs actually problem solvers or just good at regurgitating facts?. 🚨New Benchmark Alert! We built HeuriGym to ben….
0
25
0
@tsengalb99
Albert Tseng
1 month
RT @ellisk_kellis: New paper: World models + Program synthesis by @topwasu.1. World modeling on-the-fly by synthesizing programs w/ 4000+ l….
0
105
0
@tsengalb99
Albert Tseng
1 month
RT @justachetan: I will be at #CVPR2025 presenting our work on differential operators for hybrid neural fields! Catch me at our poster:. 🗓️….
0
4
0
@tsengalb99
Albert Tseng
1 month
RT @simran_s_arora: Checkout CARTRIDGES, scaling cache-time compute! An alternative to ICL for settings where many different user messages….
0
4
0
@tsengalb99
Albert Tseng
1 month
RT @EyubogluSabri: When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a….
0
70
0
@tsengalb99
Albert Tseng
1 month
RT @tri_dao: Albert and co continue to do excellent work on quantization. This time the trick is to minimize KL wrt the original model, wit….
0
19
0
@tsengalb99
Albert Tseng
1 month
Apparently I chose the worst day to release a paper, so ICYMI, we made a post-training quantization algorithm that outperforms even @Google's quantization-aware training recipe. We beat the prior SOTA by >30%, meaning faster and smaller models. More details in the original 🧵👇
Tweet media one
@tsengalb99
Albert Tseng
1 month
📣Introducing our latest work: Yet Another Quantization Algorithm!. YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯. �
Tweet media one
2
1
20
@tsengalb99
Albert Tseng
1 month
RT @JenJSun: VideoPrism is now available at: :).
0
4
0
@tsengalb99
Albert Tseng
1 month
RT @togethercompute: 5/ Quantized models don't need to lose fidelity. Check out our paper and blog for details:. 📝 Paper: .
0
2
0
@tsengalb99
Albert Tseng
1 month
RT @austinsilveria: chipmunk is up on arxiv!. across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention….
0
7
0
@tsengalb99
Albert Tseng
1 month
RT @togethercompute: 🚀 New research: YAQA — Yet Another Quantization Algorithm (yes, pronounced like yaca/jackfruit 🥭). Led by @tsengalb99,….
0
5
0
@tsengalb99
Albert Tseng
1 month
0
1
7
@tsengalb99
Albert Tseng
1 month
@chrismdesa (6/6) We also have a blog post ( with @togethercompute, who graciously provided compute resources for this project!.
1
1
6
@tsengalb99
Albert Tseng
1 month
(5/6) All this results in a lower KL and SOTA downstream performance across a wide range of models and quantizers. For more information, check out our (w/Zhaofeng Sun & @chrismdesa) paper ( and code (.
1
1
9
@tsengalb99
Albert Tseng
1 month
(4/6) YAQA’s rounding algorithm comes with nice theoretical guarantees that allow us to reason about YAQA’s behavior vs. LDLQ. In fact, we show that LDLQ is a special case of YAQA that uses a provably worse Hessian than YAQA’s.
1
0
6
@tsengalb99
Albert Tseng
1 month
(3/6) YAQA solves this by quantizing to directly minimize the KL to the original model. YAQA first computes near-optimal Kronecker-factored Hessian approximations for the KL in a fully-distributed way, and then uses these Hessians in a new adaptive rounding algorithm.
1
0
6
@tsengalb99
Albert Tseng
1 month
(2/6) Existing quantization methods like GPTQ and LDLQ (QuIP, QuIP#, QTIP) typically round to minimize the immediate activation error. However, reducing this metric does not necessarily reduce the end to end KL and produce a closer model!.
1
0
5
@tsengalb99
Albert Tseng
1 month
📣Introducing our latest work: Yet Another Quantization Algorithm!. YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯. �
Tweet media one
6
26
100
@tsengalb99
Albert Tseng
1 month
RT @turboderp_: I made a thing.
0
25
0
@tsengalb99
Albert Tseng
1 month
Apparently ExLlama3 is based off of QTIP and I just found out today?!? 🤯.
1
0
7