rui4research Profile Banner
Rui Pan Profile
Rui Pan

@rui4research

Followers
275
Following
120
Media
17
Statuses
60

PhD student at UIUC, @OptimalScale maintainer. Research Intern at Meta GenAI

Joined December 2018
Don't wanna be here? Send us removal request.
@rui4research
Rui Pan
4 months
RT @AIatMeta: Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Lla….
0
2K
0
@rui4research
Rui Pan
5 months
RT @RickyRDWang: 🚀 Introducing MA-LoT Theorem Framework: An open-source multi-agent framework utilizing the Long Chain-of-Thought to boost….
0
9
0
@rui4research
Rui Pan
5 months
RT @wzihanw: 🚀 Introducing Chain-of-Experts (CoE), A Free-lunch optimization method for DeepSeek-like MoE models!. within $200, we explore….
0
165
0
@rui4research
Rui Pan
6 months
😆Excited to share our latest work on LLM Pruning🔥. 🚀Surpass llama-3.2-1B in MMLU with 1000x less cost.✅Enable flexible model size customization.⭐Key techniques:. • Adaptive Pruning which considers layer-wise importance. • Highly frequent interleaved training that
Tweet media one
Tweet media two
Tweet media three
1
8
23
@grok
Grok
3 hours
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
4
6
37
@rui4research
Rui Pan
6 months
RT @Alibaba_Qwen: The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we….
0
2K
0
@rui4research
Rui Pan
8 months
Thanks so much to all the amazing authors in the paper @Dominicliu12 @shizhediao @RenjiePi @mircale2003 @Glaciohound. LISA also achieves the fastest speed when compared to other baselines according to a third-party evaluation. -
github.com
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Lla...
0
0
1
@rui4research
Rui Pan
8 months
Support of LISA in LMFlow ( is partial right now. Now the support of LISA + FSDP is in progress.
Tweet card summary image
github.com
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow
1
0
1
@rui4research
Rui Pan
8 months
More experimental results
Tweet media one
1
0
0
@rui4research
Rui Pan
8 months
Detailed memory consumption
Tweet media one
1
0
0
@rui4research
Rui Pan
8 months
The convergence property is also similar to full-parameter training
Tweet media one
1
0
0
@rui4research
Rui Pan
8 months
Presenting our LISA paper at NeurIPS 2024😆.- Dec. 13 at 4:30 pm (Friday afternoon).- West Ballroom A-D #5708. Fine-tuning 7B in a single GPU❔ Randomly freezing ~90% self-attention layers every 5-20 iterations allows that!🚀It is.- 3x Fast.- Memory-efficient.- Good at
Tweet media one
Tweet media two
Tweet media three
1
9
24
@rui4research
Rui Pan
8 months
RT @shizhediao: Now the post-training of Hymba is officially supported by LMFlow! 🚀.Big thanks to LMFlow folks @Dominicliu12 . @rui4researc….
0
3
0
@rui4research
Rui Pan
9 months
NVIDIA Hymba-1.5B Instruction Fine-tuning, powered by LMFlow🚀.
@PavloMolchanov
Pavlo Molchanov
9 months
🚀 Introducing Hymba-1.5B: a new hybrid architecture for efficient small language models!. ✅ Outperforms Llama, Qwen, and SmolLM2 with 6-12x less training.✅ Massive reductions in KV cache size & good throughput boost.✅ Combines Mamba & Attention in a Hybrid Parallel
Tweet media one
0
3
12
@rui4research
Rui Pan
9 months
RT @TingAstro: After a year of hard work and many failures along the way, AstroMLab is proud to present AstroSage-LLaMA-3.1-8b. Specific….
0
21
0