Syeda Nahida Akter
@__SyedaAkter
Followers
597
Following
893
Media
38
Statuses
238
PhD student at @LTIatCMU @SCSatCMU and research intern @NVIDIA. Working on improving Reasoning of Generative Models! (@reasyaay.bsky.social)
Pittsburgh, PA
Joined May 2020
Most LLMs learn to think only after pretraining—via SFT or RL. But what if they could learn to think during it? 🤔 Introducing RLP: Reinforcement Learning Pre-training—a verifier-free objective that teaches models to “think before predicting.” 🔥 Result: Massive reasoning
8
43
257
📢 Thrilled to see Nemotron-3 Super out in the world. 🚀 A hybrid MoE model with long-context support and strong reasoning capabilities — designed for scalable agentic AI systems and efficient inference. 🎉 Proud to be part of the team pushing forward open, efficient, and
2
2
36
Intelligent and Super Efficient!
Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: https://t.co/CAYpP1iK3i And yes, Ultra is coming!
0
0
4
Thrilled to share that all three of our papers were accepted to @iclr_conf 🎉 1⃣RLP: Reinforcement as a Pretraining Objective 2⃣Front-Loading Reasoning: The Synergy between Pretraining & Post-Training Data 3⃣Nemotron-CC-Math: A 133B-token High-Quality Math Pretraining Dataset
3
10
78
So grateful to all the collaborators who made these works possible 🙏 1️⃣ RLP: @ahatamiz1, @shrimai_, @jankautz, @MostofaPatwary, @MohammadShoeybi, @ctnzr, @YejinChoinka 2️⃣Front-Loading Reasoning: @shrimai_, Eric Nyberg, @MostofaPatwary, @MohammadShoeybi, @YejinChoinka, @ctnzr
0
0
3
2/2 at ICLR 2026! 🎊 So happy to share that both of our papers have made it to @iclr_conf! 1️⃣RLP: Reinforcement as a Pretraining Objective ( https://t.co/GcTJSvmuB7) 2️⃣Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data ( https://t.co/Jx7cTePHhm)
arxiv.org
The dominant paradigm for training large reasoning models starts with pre-training using next-token prediction loss on vast amounts of data. Reinforcement learning, while powerful in scaling...
2
3
42
RLP is ready for ICLR 2026! 🥳 We have added extensive ablations and new experimental results (thanks to the reviewers). Excited to share the final story soon!
I am happy to announce that RLP has been accepted to ICLR 2026 ! 🎉 RLP re-imagines the foundations of LLM training by bringing reinforcement learning directly into the pretraining stage. This was a true team effort, and it would not have been possible without the invaluable
0
7
38
Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.
41
224
1K
Lot of insights in @YejinChoinka's talk on RL training. Rip for next token prediction training (NTP) and welcome to Reinforcement Learning Pretraining (RLP). #COLM2025 No place to even stand in the room.
7
23
292
By teaching models to reason during foundational training, RLP aims to reduce logical errors and boost reliability for complex reasoning workflows. https://t.co/bQkQoNoZAv
venturebeat.com
By teaching models to reason during foundational training, the verifier-free method aims to reduce logical errors and boost reliability for complex enterprise workflows.
0
4
8
Thank you @rohanpaul_ai for highlighting our work!💫 Front-Loading Reasoning shows that inclusion of reasoning data in pretraining is beneficial, does not lead to overfitting after SFT, & has latent effect unlocked by SFT! Paper: https://t.co/gGRnmqFr9N Blog:
arxiv.org
The prevailing paradigm for enhancing the reasoning abilities of LLMs revolves around post-training on high-quality, reasoning-intensive data. While emerging literature suggests that reasoning...
New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to
0
2
9
Thank you @rohanpaul_ai for sharing our work! In "Front-Loading Reasoning", we show that injecting reasoning data into pretraining builds models that reach the frontier. On average, +22% (pretraining) → +91% (SFT) → +49% (RL) relative gains. 🚀 🔗Paper:
New @nvidia paper shows that teaching reasoning early during pretraining builds abilities that later fine-tuning cannot recover. Doing this early gives a 19% average boost on tough tasks after all post-training. Pretraining is the long first stage where the model learns to
0
0
14
When should LLMs learn to reason—early in pretraining or late in fine-tuning?🤔 Front-Loading Reasoning, shows that injecting reasoning data early creates durable, compounding gains that post-training alone cannot recover Paper: https://t.co/Bj8MUtDQnm Blog: https://t.co/c7u6sfFMC2
4
12
46
Huge thanks to the amazing team and collaborators: @shrimai_, Eric Nyberg, @MostofaPatwary, @MohammadShoeybi, @YejinChoinka, and @ctnzr! We'd love to hear your thoughts!
0
0
3
Our work provides a principled guide for training reasoning-centric LLMs: ➣ Don't wait: Inject reasoning data into pretraining. ➣ Be strategic: Use DIVERSE data for pretraining, emphasize HIGH-QUALITY data for SFT. ➣ Be careful: Avoid polluting your SFT with low-quality data.
1
0
2
Is more data always better in SFT? No. Our ablations show that blindly scaling SFT with mixed-quality data is actively HARMFUL. ❌ Doubling the SFT data dropped math reasoning scores by -5%. ✅ Scaling with small high-quality data provides consistent gains. SFT is for
1
0
3
Can a model with no reasoning in its pretraining "catch up" by getting more SFT data? No. We doubled the SFT data for our baseline model. While it improved, it still couldn't match the performance of even the weakest reasoning-pretrained model. A strong start is irreplaceable.
1
0
3
High-quality data has a surprising latent effect. 💡Adding a small, high-quality dataset to a diverse pretraining mix showed minimal immediate gains. But after SFT, its value was "unlocked", providing an additional +4% boost. A deep synergy exists: pretraining can instill
1
0
3
The optimal data strategy is phase-dependent: 🧠 Pretraining thrives on DIVERSITY & SCALE. A broad mix of reasoning patterns builds a robust foundation, giving an +11% boost over using only narrow, high-quality data at this stage. 🎯 SFT demands QUALITY. Fine-tuning on a small,
1
0
2