
Weiyang Liu
@Besteuler
Followers
2K
Following
3K
Media
49
Statuses
678
AI researcher @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. I follow my curiosity wherever it takes me. All opinions are my own.
Joined May 2009
One thing my team has discovered is the consistent effectiveness of Quantized OFT. QOFT works significantly better and more stable than QLoRA with not only better adaptation performance/stability, but also finetuning time and GPU memory.
📔What really makes OFTv2 shine is its great combatibility with quantized models. Here comes QOFT. Without bells and whistles, QOFT ourperforms QLoRA significantly in adaptation performance, GPU memory usage, and runtime. QOFT is simply better. 🧵3/6
0
0
1
For a quick start with OFTv2 and QOFT, check out our Colab tutorial: We give examples on finetuning standard/quantized LLMs (Qwen) and Stable Diffusion 3.5. Kudos to @ZejuQiu36055 for preparing the notebook!.
🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster.💾 3× less GPU memory.🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA. Try it now on Hugging face PEFT: Website: #AI #LLM . 🧵1/6
0
1
8
This is a joint work with my amazing collaborators @ZejuQiu36055, @adrian_weller, and @bschoelkopf. 📰Full paper: 💻Code: 📔Document: 👨🏫Step-by-step tutorials: 🧵6/6.
0
0
3
RT @rasbt: Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: .
0
261
0
RT @itsalexvacca: BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn….
0
39K
0
We have added some new experiments and analyses to the new version of our paper. Check it out here: We discovered that despite being generalized to spectrum-preserving training, POET can still preserve minimum hyperspherical energy. This property only
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
0
1
8
Muon is gaining attention for its use of orthogonalization, making it a natural point of comparison with POET. We computed singular value entropy over training steps and find that POET always maintains high entropy. A recent study ( suggests that this is a
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
0
1
7
Verbalized machine learning treats LLMs with prompts as function approximators. Building on this, @TimZXiao came up with the idea of studying whether LLMs can act as samplers. It turns out they’re often biased, even when they appear to understand the target distribution.
✨ New paper: Flipping Against All Odds. We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙. 🧵 Here’s what we learned—and how we fixed it. 🔗 1/
1
1
4
Fun fact: We started POET right after Orthogonal Butterfly ( but made little progress after 6 months. We switched to a different project then. We picked POET back up in Jan 2025, and it took 5 more months to finally get it right. Ideas matter—but.
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)!. POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. 1/6
0
2
11
Check out our full paper here: This is a joint work with amazing collaborators: @QiuZeju, Simon Buchholz, @TimZXiao, @maximilian_dax, @bschoelkopf . 6/6.
0
0
4