Prakash Kagitha @ NeurIPS 2025 Profile
Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

Followers
507
Following
10K
Media
94
Statuses
546

Research @DrexelUniv. Planning x Programs x LLMs. Previously, Lead Data Scientist @ https://t.co/yohjwHWaMY

Philadelphia, PA
Joined November 2014
Don't wanna be here? Send us removal request.
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
2 months
There are 70+ "reasoning" papers accepted at COLM 2025 (Oct 7-10, Montreal). Most papers elicit long reasoning for different tasks or understand the reasoning abilities/limitations of LLMs. I wrote a blog post covering ~30 of those papers 👇
6
37
296
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
Finally, calls for efficient exploration, leveraging off-policy data.
0
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
For LLMs, policy gradient (PPO, GRPO, DAPO, & ProRL) works and is prominent. He describes couple of issues with traditional RL approaches with Q-values and Value functions.
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
The above characterisation gives a clear argument for the necessity of CoT to elicit (algorithmic) reasoning. Linear-time ≠ one-shot
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
Find the keynote on YouTube here: https://t.co/PoKLKe77NU Find some of the other reflections in this blog: https://t.co/nQE6sWH7iB
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
Keynote by Dale Schuurmans at Reinforcement Learning Conference changed the way I think about Chain of Thought (CoT)! LLM is a computer. Chain of thought is computation. RL learns the correct policy/algorithm. Can we correctly induce programs from finite data? -> Out of
2
1
4
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
5 days
Great offering by @CollinearAI making end to end distillation seamless, taking care of dataset generation and integrating with @thinkymachines’s Tinker and multiple Inference providers.
@HeMuyu0327
Muyu He
5 days
We have SDKs for training (Tinker) and inference (together, openrouter) that can run from a CPU machine, but we don't have one for generating datasets, launching distillation runs, and spinning ablations, which takes 90% of a researcher's time. At @CollinearAI, we are proud
0
0
2
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
Here is the code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar Resources https://t.co/HJSONU8aI0
Tweet card summary image
github.com
Code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar - prakashkagitha/neurips2025_papers
0
0
2
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[78 citations] Absolute Zero: Reinforced Self-play Reasoning with Zero Data Self-play + verifiable rewards enable learning reasoning without curated training data. ArXiv: https://t.co/ooUd0HtdeN Thread by the paper author:
Tweet card summary image
arxiv.org
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent...
@_AndrewZhao
Andrew Zhao
6 months
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
0
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[81 citations] Flow-GRPO: Training Flow Matching Models via Online RL Online RL recipe that sharpens flow-matching generative models for better sample quality. ArXiv:
Tweet card summary image
arxiv.org
We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion...
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[83 citations] MMaDA: Multimodal Large Diffusion Language Models Unifies diffusion with LMs across modalities for more robust multimodal generation. ArXiv:
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[88 citations] Titans: Learning to Memorize at Test Time Hybrid attention + meta in-context memory for strong needle-in-a-haystack retrieval. ArXiv: https://t.co/3LqHG8tmOP Thread by the paper author:
Tweet card summary image
arxiv.org
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size...
@behrouz_ali
Ali Behrouz
10 months
Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[88 citations] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Collective-MCTS brings o1-style reflection and improved reasoning to multimodal models. ArXiv:
Tweet card summary image
arxiv.org
In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose...
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[89 citations] TTRL: Test-Time Reinforcement Learning Online RL at inference using unlabeled data + priors — no offline labeling needed. ArXiv: https://t.co/a7pKYEo13Z Thread by the paper author on follow-up paper:
Tweet card summary image
arxiv.org
This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation...
@OkhayIea
Kaiyan Zhang
3 months
TTRL showed LLMs can provide intrinsic rewards for RL. Now SSRL: LLMs simulate world-knowledge states for Agentic RL, enabling sim-to-real generalization. So much knowledge in LLMs still awaits elicitation with RL—maybe even a world model? Paper: https://t.co/AJ7ZfwjT3Q Code:
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[98 citations] SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Trains RL on real GitHub code evolution — yields strong SWE-bench Verified results. ArXiv:
Tweet card summary image
arxiv.org
The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While...
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[100 citations] ToolRL: Reward is All Tool Learning Needs Reward-driven tool learning without SFT — uses principled reward design via GRPO. ArXiv: https://t.co/NWs9MrglO4 Thread by the paper author:
Tweet card summary image
arxiv.org
Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios....
@qiancheng1231
Cheng Qian @ EMNLP2025
7 months
🚀 ToolRL unlocks LLMs' true tool mastery! The secret? Smart rewards > more data. 📖 Introducing newest paper: ToolRL: Reward is all Tool Learning Needs Paper Link: https://t.co/wXGfs9o6Zx Github Link: https://t.co/Q8qdpqmbr3
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[105 citations] A-Mem: Agentic Memory for LLM Agents Zettelkasten-inspired, self-organizing memory enabling stronger long-horizon agents. ArXIv: https://t.co/WAoe4lMOZs Thread by the paper author:
Tweet card summary image
arxiv.org
While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems...
@wujiang_ai
Wujiang Xu
2 months
What if an LLM agent's memory could think for itself? Excited to share our work on "Agentic Memory," accepted at NeurIPS 2025! We propose a new memory system (A-Mem) inspired by Zettelkasten, where memories actively link, organize, and evolve. #NeurIPS2025 #LLM #agent
1
0
1
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[110 citations] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Unified live vision-audio interaction, a step toward seamless real-time multimodal agents. ArXiv: https://t.co/0I2E5PMs2v Official repo:
Tweet card summary image
github.com
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction - VITA-MLLM/VITA
1
0
0
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[111 citations] VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning RL + forced rethinking boosts multimodal self-reflection and SoTA VLM reasoning. ArXiv: https://t.co/MnYUYsppWh Thread by the paper author:
Tweet card summary image
arxiv.org
Recently, slow-thinking systems like GPT-o1 and DeepSeek-R1 have demonstrated great potential in solving challenging problems through explicit reflection. They significantly outperform the best...
@WenhuChen
Wenhu Chen
6 months
Our reasoning-based model VL-Rethinker also has a huggingface demo at https://t.co/bTa2AJ129d. The paper is in https://t.co/OYant6MCJF.
1
1
4
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[112 citations] Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models Reveals “underthinking”: models switch reasoning paths too soon; proposes decoding fixes. ArXiv: https://t.co/Rj1jKG4UuC Thread by the paper author:
Tweet card summary image
arxiv.org
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we...
@tuzhaopeng
Zhaopeng Tu
9 months
Thank you so much for your thoughtful insights, Ethan! You're absolutely spot-on -- our research highlights how large models often "underthink" by prematurely shifting away from promising reasoning paths. By identifying and addressing this tendency, we hope to enhance their
1
0
1
@prakashkagitha
Prakash Kagitha @ NeurIPS 2025
7 days
[115 citations] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach “Latent reasoning” scales test-time compute without longer CoT by using recurrent depth. ArXiv: https://t.co/r5frnXmQel Thread by the paper author: (author thread not found)
Tweet card summary image
arxiv.org
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby...
1
1
1