Cheng Luo
@ChengLuo_lc
Followers
79
Following
80
Media
5
Statuses
28
The whole @InfiniAILab is at #NeurIPS this week! Our group is currently working on diverse directions of GenAI, .e.g., Scalable and Efficient RL, VideoGen, Modeling, Model Arch & Sys Co-Design (Many new releases coming!!). Come and talk to us @RJ_Sadhukhan @IronSteveZhou
0
11
107
Great work! We are looking forward to see this work at our NeurIPS 2025 efficient reasoning workshop
openreview.net
Reinforcement Learning with Verifiable Rewards (RLVR) reliably improves the reasoning performance of large language models, yet it appears to modify only a small fraction of parameters. We revisit...
🚨 New Work! 🤔 Is RL black-box weight tinkering? 😉 No. We provably show RLVR follows a 🧭 — always updating the same off-principal regions while preserving the model's core spectra. ⚠️ Different optimization regime than SFT — SFT-era PEFT tricks can misfire(like PiSSA, the
1
0
5
Glad that we are taking the opposite approach: while OpenAI is adding compute-intensive offerings with extra fees, we're making video generation less compute-intensive so everyone can interact with it in real-time. Algorithmic breakthroughs > throwing more compute.
Over the next few weeks, we are launching some new compute-intensive offerings. Because of the associated costs, some features will initially only be available to Pro subscribers, and some new products will have additional fees. Our intention remains to drive the cost of
30
23
761
We need more reviewers for the 1s Workshop on Efficient Reasoning(ER) at @NeurIPSConf, if you are interested, please fill out the nomination form
docs.google.com
We strive to expand our reviewing pool by welcoming newer members of the community. We encourage nominations from senior community members as well as self-nominations from individuals who have either...
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
0
5
16
🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:
openreview.net
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop ER
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
0
4
27
@NeurIPSConf 🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:
openreview.net
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop ER
0
1
1
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
1
2
2
@NeurIPSConf 🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:
openreview.net
Welcome to the OpenReview homepage for NeurIPS 2025 Workshop ER
0
1
1
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
1
2
2
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n
6
85
220
🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)
7
72
248
HeadInfer: Unlocking Long-Context LLM Inference on Consumer GPUs (Million-level Tokens) *long-context inputs require large GPU memory. *A standard LLM like Llama-3–8B requires 207GB of GPU memory for 1 million tokens — far beyond the capabilities of consumer GPUs like the RTX
arxiv.org
Transformer-based large language models (LLMs) demonstrate impressive performance in long context generation. Extending the context length has disproportionately shifted the memory footprint of...
2
13
88
Dec. 10-Dec. 15 in NeurIPS’24. Our poster will be presented Friday 11 am.See u in Vancouver!
0
0
1
🤩🤩 we introduce MST, a memory-efficient transformer, reducing intermediate memory usage and enabling longer sequence training without compromising performance. 🚀🚀 How does it work? MST partitions sequence into mini-sequences and apply activation recomputation for optimal
1
1
2
🤩🤩 we introduce MST, a memory-efficient transformer, reducing intermediate memory usage and enabling longer sequence training without compromising performance. 🚀🚀 How does it work? MST partitions sequence into mini-sequences and apply activation recomputation for optimal
1
1
2
Really 👀 new Paper, MINI-SEQUENCE TRANSFORMER claims to extend the maximum context length of Qwen, Mistral, and Gemma-2 by 12-24x. MST enables efficient long-sequence training by reducing intermediate memory overhead It achieves 2.7x improvement in perplexity with 30k context
2
4
10
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Saw a 16x increase in sequence length training size with 55% MFU. Agnostic with ZeRO and FSDP memory optimization techniques. Links below
1
7
58
Curious about boosting context length in Llama 3.1 by 16x? 🦙 Our Mini-sequence Transformer (MST) offers insights! 🚀 MST extends context length with no performance drop. 📈 Our paper: https://t.co/rOOl4VVYEw and GitHub: https://t.co/hen76AwNVV.
#Llama #NLP #AI #llama31
github.com
Contribute to wdlctc/mini-s development by creating an account on GitHub.
0
1
1
Introducing long-context transformer using mini sequences. It is a simple and effective method for highly efficient and accurate LLM training with extremely long sequences. Our research demonstrates that the Llama3-8B model can be trained with context lengths up to 60k tokens on
1
8
47