Besteuler Profile Banner
Weiyang Liu Profile
Weiyang Liu

@Besteuler

Followers
2K
Following
3K
Media
49
Statuses
678

AI researcher @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. I follow my curiosity wherever it takes me. All opinions are my own.

Joined May 2009
Don't wanna be here? Send us removal request.
@Besteuler
Weiyang Liu
52 minutes
RT @abakcus: The beauty of math book covers from @doverpubs. 😍
Tweet media one
0
161
0
@Besteuler
Weiyang Liu
2 days
RT @FrnkNlsn: Books with good synergies!
Tweet media one
0
81
0
@Besteuler
Weiyang Liu
8 days
While working on the improved version of Orthogonal Finetuning (OFT) (, we also found that OFT represents a more general class of finetuning method -- sequential adaptation. This poses an interesting comparison to LoRA, which represents parallel
Tweet media one
0
0
11
@Besteuler
Weiyang Liu
8 days
One thing my team has discovered is the consistent effectiveness of Quantized OFT. QOFT works significantly better and more stable than QLoRA with not only better adaptation performance/stability, but also finetuning time and GPU memory.
@Besteuler
Weiyang Liu
10 days
📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6
Tweet media one
0
0
1
@Besteuler
Weiyang Liu
8 days
For a quick start with OFTv2 and QOFT, check out our Colab tutorial: We give examples on finetuning standard/quantized LLMs (Qwen) and Stable Diffusion 3.5. Kudos to @ZejuQiu36055 for preparing the notebook!.
@Besteuler
Weiyang Liu
10 days
🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster.💾 3× less GPU memory.🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA. Try it now on Hugging face PEFT: Website: #AI #LLM . 🧵1/6
Tweet media one
0
1
8
@Besteuler
Weiyang Liu
10 days
This is a joint work with my amazing collaborators @ZejuQiu36055, @adrian_weller, and @bschoelkopf. 📰Full paper: 💻Code: 📔Document: 👨‍🏫Step-by-step tutorials: 🧵6/6.
0
0
3
@Besteuler
Weiyang Liu
10 days
📝To enable efficient orthogonal parameterization, we adopt the Cayley-Neumann Parameterization (CNP) proposed in our previous work (POET; . This turns out to work perfectly without any performance loss. 🧵5/6
Tweet media one
1
0
3
@Besteuler
Weiyang Liu
10 days
💡The key contribution of OFTv2 and QOFT is the switch from weight-centric implmentation to input-centric implementation. This idea is inspired by matrix-free methods in efficient numerical algorithms (e.g., the power method and the Lanczos algorithm). 🧵4/6
Tweet media one
1
0
1
@Besteuler
Weiyang Liu
10 days
📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6
Tweet media one
1
0
1
@Besteuler
Weiyang Liu
10 days
🧐Why care?. Orthogonal finetuning (OFT) is shown effective in finetuning foundation models to dowstream tasks without severe catastrophic forgetting of pretraining knowledge. However, OFT runs slower and gulps more GPU memory than LoRA. OFTv2 has fixed these problems. 🧵2/6
Tweet media one
1
0
1
@Besteuler
Weiyang Liu
10 days
🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster.💾 3× less GPU memory.🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA. Try it now on Hugging face PEFT: Website: #AI #LLM . 🧵1/6
Tweet media one
1
5
17
@Besteuler
Weiyang Liu
15 days
RT @rasbt: Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: .
0
261
0
@Besteuler
Weiyang Liu
18 days
RT @itsalexvacca: BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn….
0
39K
0
@Besteuler
Weiyang Liu
18 days
We have added some new experiments and analyses to the new version of our paper. Check it out here: We discovered that despite being generalized to spectrum-preserving training, POET can still preserve minimum hyperspherical energy. This property only
Tweet media one
@Besteuler
Weiyang Liu
25 days
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
Tweet media one
Tweet media two
0
1
8
@Besteuler
Weiyang Liu
20 days
Muon is gaining attention for its use of orthogonalization, making it a natural point of comparison with POET. We computed singular value entropy over training steps and find that POET always maintains high entropy. A recent study ( suggests that this is a
Tweet media one
@Besteuler
Weiyang Liu
25 days
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
Tweet media one
Tweet media two
0
1
7
@Besteuler
Weiyang Liu
23 days
Verbalized machine learning treats LLMs with prompts as function approximators. Building on this, @TimZXiao came up with the idea of studying whether LLMs can act as samplers. It turns out they’re often biased, even when they appear to understand the target distribution.
@TimZXiao
Tim Xiao
23 days
✨ New paper: Flipping Against All Odds. We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙. 🧵 Here’s what we learned—and how we fixed it. 🔗 1/
Tweet media one
1
1
4
@Besteuler
Weiyang Liu
24 days
Fun fact: We started POET right after Orthogonal Butterfly ( but made little progress after 6 months. We switched to a different project then. We picked POET back up in Jan 2025, and it took 5 more months to finally get it right. Ideas matter—but.
@Besteuler
Weiyang Liu
25 days
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
Tweet media one
Tweet media two
0
2
11
@Besteuler
Weiyang Liu
25 days
Check out our full paper here: This is a joint work with amazing collaborators: @QiuZeju, Simon Buchholz, @TimZXiao, @maximilian_dax, @bschoelkopf . 6/6.
0
0
4
@Besteuler
Weiyang Liu
25 days
POET also exhibits quite different training dynamics, compared to standard training. We show how a weight matrix changes in the animation. We hope our method can bring more intriguing insights to training LLMs. 5/6
1
0
3