Shangshang Wang
@UpupWang
Followers
583
Following
41
Media
24
Statuses
55
Phd @CSatUSC | @ShanghaiTechUni | Post-train via RL & Pre-train for AI4Science.
Los Angeles
Joined December 2024
Our code is built on torchtune @PyTorch. We hope that our implementation can also contribute to their new repo for post-training! https://t.co/ooH4AJFKnn
https://t.co/AsWWG1kmKt
github.com
Tora: Torchtune-LoRA for RL. Contribute to shangshang-wang/Tora development by creating an account on GitHub.
1
0
7
(Q)DoRA-with-Cache-based GRPO The standard DoRA layer recalculates the weight norm and magnitude scale on every forward pass. DoRA with Cache optimizes this by caching these expensive computations.
1
0
6
We provide detailed benchmarking of Qwen2.5 models with various model sizes (1.5B-32B) to compare LoRA-based and full-parameter training on only 2x A40 GPUs. See below for: (Q)LoRA, (Q)DoRA, and (Q)Dora-with-Cache — where we cache expensive computations for more-efficient DoRA.
1
0
7
Check out our Tina project (efficient RL for reasoning with LoRA) here: https://t.co/qntPWxzDPJ
😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
1
0
7
We now know that LoRA can match full-parameter RL training (from https://t.co/pGxoMLFIGv and our Tina paper https://t.co/dkXdxV3eNj), but what about DoRA, QLoRA, and more? We are releasing a clean LoRA-for-RL repo to explore them all. https://t.co/AsWWG1kmKt
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
13
71
567
LoRA is real for Reasoning. https://t.co/pGxoMLFIGv
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
2
3
186
This is another amazing collaboration with Julian @julian_asilis , Omer @oemerakgull , Enes, Oliver @olliezliu and Deqing @DeqingFu in the course taught by Willie @willieneis (both the teacher and the advisor). Thanks everyone!
2
0
19
Curious about the details for these efficiency claims? We open-source everything for full reproducibility: Paper: https://t.co/dZ2VMLQWEd Blog: https://t.co/u2V8D0c3Y0 Code: https://t.co/1Kl5MRPwAB Model: https://t.co/GASQjSPJ0m Training Logs:
2
1
39
SAE-Tuning trains models that match RL-trained counterparts’ performance while reducing costs by >2000x and time by >450x. The trained model is transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling transfer across datasets and models.
1
0
27
Such efficiency stems from our novel SAE-Tuning method, which expands the use of SAEs beyond test-time steering. In SAE-Tuning, the SAE first “extracts” latent reasoning features and then guides a standard supervised fine-tuning process to “elicit” reasoning abilities.
1
0
25
👧 Check out more about Tina down below! Paper: https://t.co/dkXdxV3eNj Notion Blog: https://t.co/vue286jaH0 Code: https://t.co/CcTLnx9VaH Model: https://t.co/TgeThEaSQL Training Logs: https://t.co/DWKXxXN4Zp Tina's avatar is generated by GPT-4o based on KYNE's girls.
0
0
19
We also want to express our gratitude to the broader open-source community. This research was made possible by leveraging numerous publicly available resources from DeepScaleR @Agentica_ , STILL, OpenThoughts @bespokelabsai , OpenR1 @huggingface , LIMR, and OpenRS projects.
1
0
9
This is an amazing collaboration with Julian @julian_asilis , Omer @oemerakgull , Enes, and Oliver @olliezliu in the course taught by Willie @willieneis (both the teacher and the advisor) Thanks everyone!
1
0
10
[9/9] 🚀 We thus hypothesize that LoRA’s effectiveness and efficiency stem from rapidly adapting the reasoning format under RL while preserving base model knowledge, a likely more compute-efficient process than the deep knowledge integration of full-parameter training.
4
0
13
[8/9] 🔬 Observation 2) We consistently observe a training phase transition in the format-related metrics (format reward, completion length) but NOT accuracy-related metrics across most Tina models. The best-performance checkpoint is always found around this transition point.
1
0
13
[7/9] 🩺 Observation 1) We observe that in Tina models, increased training compute inversely affects performance, in contrast to full-parameter models. This observation highlights a “less compute can yield more performance” phenomenon.
1
0
11