Prakash Kagitha @ NeurIPS 2025
@prakashkagitha
Followers
507
Following
10K
Media
94
Statuses
546
Research @DrexelUniv. Planning x Programs x LLMs. Previously, Lead Data Scientist @ https://t.co/yohjwHWaMY
Philadelphia, PA
Joined November 2014
There are 70+ "reasoning" papers accepted at COLM 2025 (Oct 7-10, Montreal). Most papers elicit long reasoning for different tasks or understand the reasoning abilities/limitations of LLMs. I wrote a blog post covering ~30 of those papers 👇
6
37
296
Finally, calls for efficient exploration, leveraging off-policy data.
0
0
0
For LLMs, policy gradient (PPO, GRPO, DAPO, & ProRL) works and is prominent. He describes couple of issues with traditional RL approaches with Q-values and Value functions.
1
0
0
The above characterisation gives a clear argument for the necessity of CoT to elicit (algorithmic) reasoning. Linear-time ≠one-shot
1
0
0
Find the keynote on YouTube here: https://t.co/PoKLKe77NU Find some of the other reflections in this blog: https://t.co/nQE6sWH7iB
1
0
0
Keynote by Dale Schuurmans at Reinforcement Learning Conference changed the way I think about Chain of Thought (CoT)! LLM is a computer. Chain of thought is computation. RL learns the correct policy/algorithm. Can we correctly induce programs from finite data? -> Out of
2
1
4
Great offering by @CollinearAI making end to end distillation seamless, taking care of dataset generation and integrating with @thinkymachines’s Tinker and multiple Inference providers.
We have SDKs for training (Tinker) and inference (together, openrouter) that can run from a CPU machine, but we don't have one for generating datasets, launching distillation runs, and spinning ablations, which takes 90% of a researcher's time. At @CollinearAI, we are proud
0
0
2
Here is the code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar Resources https://t.co/HJSONU8aI0
github.com
Code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar - prakashkagitha/neurips2025_papers
0
0
2
[78 citations] Absolute Zero: Reinforced Self-play Reasoning with Zero Data Self-play + verifiable rewards enable learning reasoning without curated training data. ArXiv: https://t.co/ooUd0HtdeN Thread by the paper author:
arxiv.org
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent...
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
0
0
0
[81 citations] Flow-GRPO: Training Flow Matching Models via Online RL Online RL recipe that sharpens flow-matching generative models for better sample quality. ArXiv:
arxiv.org
We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion...
1
0
0
[83 citations] MMaDA: Multimodal Large Diffusion Language Models Unifies diffusion with LMs across modalities for more robust multimodal generation. ArXiv:
1
0
0
[88 citations] Titans: Learning to Memorize at Test Time Hybrid attention + meta in-context memory for strong needle-in-a-haystack retrieval. ArXiv: https://t.co/3LqHG8tmOP Thread by the paper author:
arxiv.org
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size...
Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans
1
0
0
[88 citations] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Collective-MCTS brings o1-style reflection and improved reasoning to multimodal models. ArXiv:
arxiv.org
In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose...
1
0
0
[89 citations] TTRL: Test-Time Reinforcement Learning Online RL at inference using unlabeled data + priors — no offline labeling needed. ArXiv: https://t.co/a7pKYEo13Z Thread by the paper author on follow-up paper:
arxiv.org
This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation...
TTRL showed LLMs can provide intrinsic rewards for RL. Now SSRL: LLMs simulate world-knowledge states for Agentic RL, enabling sim-to-real generalization. So much knowledge in LLMs still awaits elicitation with RL—maybe even a world model? Paper: https://t.co/AJ7ZfwjT3Q Code:
1
0
0
[98 citations] SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Trains RL on real GitHub code evolution — yields strong SWE-bench Verified results. ArXiv:
arxiv.org
The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While...
1
0
0
[100 citations] ToolRL: Reward is All Tool Learning Needs Reward-driven tool learning without SFT — uses principled reward design via GRPO. ArXiv: https://t.co/NWs9MrglO4 Thread by the paper author:
arxiv.org
Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios....
🚀 ToolRL unlocks LLMs' true tool mastery! The secret? Smart rewards > more data. 📖 Introducing newest paper: ToolRL: Reward is all Tool Learning Needs Paper Link: https://t.co/wXGfs9o6Zx Github Link: https://t.co/Q8qdpqmbr3
1
0
0
[105 citations] A-Mem: Agentic Memory for LLM Agents Zettelkasten-inspired, self-organizing memory enabling stronger long-horizon agents. ArXIv: https://t.co/WAoe4lMOZs Thread by the paper author:
arxiv.org
While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems...
What if an LLM agent's memory could think for itself? Excited to share our work on "Agentic Memory," accepted at NeurIPS 2025! We propose a new memory system (A-Mem) inspired by Zettelkasten, where memories actively link, organize, and evolve. #NeurIPS2025 #LLM #agent
1
0
1
[110 citations] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Unified live vision-audio interaction, a step toward seamless real-time multimodal agents. ArXiv: https://t.co/0I2E5PMs2v Official repo:
github.com
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction - VITA-MLLM/VITA
1
0
0
[111 citations] VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning RL + forced rethinking boosts multimodal self-reflection and SoTA VLM reasoning. ArXiv: https://t.co/MnYUYsppWh Thread by the paper author:
arxiv.org
Recently, slow-thinking systems like GPT-o1 and DeepSeek-R1 have demonstrated great potential in solving challenging problems through explicit reflection. They significantly outperform the best...
Our reasoning-based model VL-Rethinker also has a huggingface demo at https://t.co/bTa2AJ129d. The paper is in https://t.co/OYant6MCJF.
1
1
4
[112 citations] Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models Reveals “underthinking”: models switch reasoning paths too soon; proposes decoding fixes. ArXiv: https://t.co/Rj1jKG4UuC Thread by the paper author:
arxiv.org
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we...
Thank you so much for your thoughtful insights, Ethan! You're absolutely spot-on -- our research highlights how large models often "underthink" by prematurely shifting away from promising reasoning paths. By identifying and addressing this tendency, we hope to enhance their
1
0
1
[115 citations] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach “Latent reasoning” scales test-time compute without longer CoT by using recurrent depth. ArXiv: https://t.co/r5frnXmQel Thread by the paper author: (author thread not found)
arxiv.org
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby...
1
1
1