Weiyang Liu @Besteuler X Profile

Weiyang Liu

@Besteuler

Followers

2K

Following

3K

Media

49

Statuses

678

AI researcher @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. I follow my curiosity wherever it takes me. All opinions are my own.

Joined May 2009

Don't wanna be here? Send us removal request.

Weiyang Liu

@Besteuler

52 minutes

RT @abakcus: The beauty of math book covers from @doverpubs. 😍

0

161

0

Weiyang Liu

@Besteuler

2 days

RT @FrnkNlsn: Books with good synergies!

0

81

0

Weiyang Liu

@Besteuler

8 days

While working on the improved version of Orthogonal Finetuning (OFT) (, we also found that OFT represents a more general class of finetuning method -- sequential adaptation. This poses an interesting comparison to LoRA, which represents parallel

0

11

Weiyang Liu

@Besteuler

8 days

One thing my team has discovered is the consistent effectiveness of Quantized OFT. QOFT works significantly better and more stable than QLoRA with not only better adaptation performance/stability, but also finetuning time and GPU memory.

Weiyang Liu

@Besteuler

10 days

📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6

0

1

Weiyang Liu

@Besteuler

8 days

For a quick start with OFTv2 and QOFT, check out our Colab tutorial: We give examples on finetuning standard/quantized LLMs (Qwen) and Stable Diffusion 3.5. Kudos to @ZejuQiu36055 for preparing the notebook!.

Weiyang Liu

@Besteuler

10 days

🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster.💾 3× less GPU memory.🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA. Try it now on Hugging face PEFT: Website: #AI #LLM . 🧵1/6

0

1

8

Weiyang Liu

@Besteuler

10 days

This is a joint work with my amazing collaborators @ZejuQiu36055, @adrian_weller, and @bschoelkopf. 📰Full paper: 💻Code: 📔Document: 👨‍🏫Step-by-step tutorials: 🧵6/6.

0

3

Weiyang Liu

@Besteuler

10 days

📝To enable efficient orthogonal parameterization, we adopt the Cayley-Neumann Parameterization (CNP) proposed in our previous work (POET; . This turns out to work perfectly without any performance loss. 🧵5/6

1

0

3

Weiyang Liu

@Besteuler

10 days

💡The key contribution of OFTv2 and QOFT is the switch from weight-centric implmentation to input-centric implementation. This idea is inspired by matrix-free methods in efficient numerical algorithms (e.g., the power method and the Lanczos algorithm). 🧵4/6

1

0

1

Weiyang Liu

@Besteuler

10 days

📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6

1

0

1

Weiyang Liu

@Besteuler

10 days

🧐Why care?. Orthogonal finetuning (OFT) is shown effective in finetuning foundation models to dowstream tasks without severe catastrophic forgetting of pretraining knowledge. However, OFT runs slower and gulps more GPU memory than LoRA. OFTv2 has fixed these problems. 🧵2/6

1

0

1

Weiyang Liu

@Besteuler

10 days

🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster.💾 3× less GPU memory.🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA. Try it now on Hugging face PEFT: Website: #AI #LLM . 🧵1/6

1

5

17

Weiyang Liu

@Besteuler

15 days

RT @rasbt: Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: .

0

261

0

Weiyang Liu

@Besteuler

18 days

RT @itsalexvacca: BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn….

0

39K

0

Weiyang Liu

@Besteuler

18 days

We have added some new experiments and analyses to the new version of our paper. Check it out here: We discovered that despite being generalized to spectrum-preserving training, POET can still preserve minimum hyperspherical energy. This property only

Weiyang Liu

@Besteuler

25 days

📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6

0

1

8

Weiyang Liu

@Besteuler

20 days

Muon is gaining attention for its use of orthogonalization, making it a natural point of comparison with POET. We computed singular value entropy over training steps and find that POET always maintains high entropy. A recent study ( suggests that this is a

Weiyang Liu

@Besteuler

25 days

📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6

0

1

7

Weiyang Liu

@Besteuler

23 days

Verbalized machine learning treats LLMs with prompts as function approximators. Building on this, @TimZXiao came up with the idea of studying whether LLMs can act as samplers. It turns out they’re often biased, even when they appear to understand the target distribution.

Tim Xiao

@TimZXiao

23 days

✨ New paper: Flipping Against All Odds. We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙. 🧵 Here’s what we learned—and how we fixed it. 🔗 1/

1

4

Weiyang Liu

@Besteuler

24 days

Fun fact: We started POET right after Orthogonal Butterfly ( but made little progress after 6 months. We switched to a different project then. We picked POET back up in Jan 2025, and it took 5 more months to finally get it right. Ideas matter—but.

Weiyang Liu

@Besteuler

25 days

📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6

0

2

11

Weiyang Liu

@Besteuler

25 days

Check out our full paper here: This is a joint work with amazing collaborators: @QiuZeju, Simon Buchholz, @TimZXiao, @maximilian_dax, @bschoelkopf . 6/6.

0

4

Weiyang Liu

@Besteuler

25 days

POET also exhibits quite different training dynamics, compared to standard training. We show how a weight matrix changes in the animation. We hope our method can bring more intriguing insights to training LLMs. 5/6

1

0

3