Prakash Kagitha @ NeurIPS 2025 @prakashkagitha X Profile

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

Followers

507

Following

10K

Media

94

Statuses

546

Research @DrexelUniv. Planning x Programs x LLMs. Previously, Lead Data Scientist @ https://t.co/yohjwHWaMY

https://t.co/2h55ndWz0H

Philadelphia, PA

Joined November 2014

Don't wanna be here? Send us removal request.

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

2 months

There are 70+ "reasoning" papers accepted at COLM 2025 (Oct 7-10, Montreal). Most papers elicit long reasoning for different tasks or understand the reasoning abilities/limitations of LLMs. I wrote a blog post covering ~30 of those papers 👇

6

37

296

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

Finally, calls for efficient exploration, leveraging off-policy data.

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

For LLMs, policy gradient (PPO, GRPO, DAPO, & ProRL) works and is prominent. He describes couple of issues with traditional RL approaches with Q-values and Value functions.

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

The above characterisation gives a clear argument for the necessity of CoT to elicit (algorithmic) reasoning. Linear-time ≠ one-shot

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

Find the keynote on YouTube here: https://t.co/PoKLKe77NU Find some of the other reflections in this blog: https://t.co/nQE6sWH7iB

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

Keynote by Dale Schuurmans at Reinforcement Learning Conference changed the way I think about Chain of Thought (CoT)! LLM is a computer. Chain of thought is computation. RL learns the correct policy/algorithm. Can we correctly induce programs from finite data? -> Out of

2

1

4

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

5 days

Great offering by @CollinearAI making end to end distillation seamless, taking care of dataset generation and integrating with @thinkymachines’s Tinker and multiple Inference providers.

Muyu He

@HeMuyu0327

5 days

We have SDKs for training (Tinker) and inference (together, openrouter) that can run from a CPU machine, but we don't have one for generating datasets, launching distillation runs, and spinning ablations, which takes 90% of a researcher's time. At @CollinearAI, we are proud

0

2

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

Here is the code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar Resources https://t.co/HJSONU8aI0

github.com

Code for getting NeurIPS 2025 papers from OpenReview, linking them to Arxiv and Semantic Scholar - prakashkagitha/neurips2025_papers

0

2

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[78 citations] Absolute Zero: Reinforced Self-play Reasoning with Zero Data Self-play + verifiable rewards enable learning reasoning without curated training data. ArXiv: https://t.co/ooUd0HtdeN Thread by the paper author:

arxiv.org

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent...

Andrew Zhao

@_AndrewZhao

6 months

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[81 citations] Flow-GRPO: Training Flow Matching Models via Online RL Online RL recipe that sharpens flow-matching generative models for better sample quality. ArXiv:

arxiv.org

We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion...

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[83 citations] MMaDA: Multimodal Large Diffusion Language Models Unifies diffusion with LMs across modalities for more robust multimodal generation. ArXiv:

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[88 citations] Titans: Learning to Memorize at Test Time Hybrid attention + meta in-context memory for strong needle-in-a-haystack retrieval. ArXiv: https://t.co/3LqHG8tmOP Thread by the paper author:

arxiv.org

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size...

Ali Behrouz

@behrouz_ali

10 months

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[88 citations] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Collective-MCTS brings o1-style reflection and improved reasoning to multimodal models. ArXiv:

arxiv.org

In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose...

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[89 citations] TTRL: Test-Time Reinforcement Learning Online RL at inference using unlabeled data + priors — no offline labeling needed. ArXiv: https://t.co/a7pKYEo13Z Thread by the paper author on follow-up paper:

arxiv.org

This paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation...

Kaiyan Zhang

@OkhayIea

3 months

TTRL showed LLMs can provide intrinsic rewards for RL. Now SSRL: LLMs simulate world-knowledge states for Agentic RL, enabling sim-to-real generalization. So much knowledge in LLMs still awaits elicitation with RL—maybe even a world model? Paper: https://t.co/AJ7ZfwjT3Q Code:

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[98 citations] SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Trains RL on real GitHub code evolution — yields strong SWE-bench Verified results. ArXiv:

arxiv.org

The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While...

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[100 citations] ToolRL: Reward is All Tool Learning Needs Reward-driven tool learning without SFT — uses principled reward design via GRPO. ArXiv: https://t.co/NWs9MrglO4 Thread by the paper author:

arxiv.org

Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios....

Cheng Qian @ EMNLP2025

@qiancheng1231

7 months

🚀 ToolRL unlocks LLMs' true tool mastery! The secret? Smart rewards > more data. 📖 Introducing newest paper: ToolRL: Reward is all Tool Learning Needs Paper Link: https://t.co/wXGfs9o6Zx Github Link: https://t.co/Q8qdpqmbr3

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[105 citations] A-Mem: Agentic Memory for LLM Agents Zettelkasten-inspired, self-organizing memory enabling stronger long-horizon agents. ArXIv: https://t.co/WAoe4lMOZs Thread by the paper author:

arxiv.org

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems...

Wujiang Xu

@wujiang_ai

2 months

What if an LLM agent's memory could think for itself? Excited to share our work on "Agentic Memory," accepted at NeurIPS 2025! We propose a new memory system (A-Mem) inspired by Zettelkasten, where memories actively link, organize, and evolve. #NeurIPS2025 #LLM #agent

1

0

1

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[110 citations] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Unified live vision-audio interaction, a step toward seamless real-time multimodal agents. ArXiv: https://t.co/0I2E5PMs2v Official repo:

github.com

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction - VITA-MLLM/VITA

1

0

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[111 citations] VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning RL + forced rethinking boosts multimodal self-reflection and SoTA VLM reasoning. ArXiv: https://t.co/MnYUYsppWh Thread by the paper author:

arxiv.org

Recently, slow-thinking systems like GPT-o1 and DeepSeek-R1 have demonstrated great potential in solving challenging problems through explicit reflection. They significantly outperform the best...

Wenhu Chen

@WenhuChen

6 months

Our reasoning-based model VL-Rethinker also has a huggingface demo at https://t.co/bTa2AJ129d. The paper is in https://t.co/OYant6MCJF.

1

4

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[112 citations] Thoughts Are All Over the Place: On the Underthinking of Long Reasoning Models Reveals “underthinking”: models switch reasoning paths too soon; proposes decoding fixes. ArXiv: https://t.co/Rj1jKG4UuC Thread by the paper author:

arxiv.org

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we...

Zhaopeng Tu

@tuzhaopeng

9 months

Thank you so much for your thoughtful insights, Ethan! You're absolutely spot-on -- our research highlights how large models often "underthink" by prematurely shifting away from promising reasoning paths. By identifying and addressing this tendency, we hope to enhance their

1

0

1

Prakash Kagitha @ NeurIPS 2025

@prakashkagitha

7 days

[115 citations] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach “Latent reasoning” scales test-time compute without longer CoT by using recurrent depth. ArXiv: https://t.co/r5frnXmQel Thread by the paper author: (author thread not found)

arxiv.org

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby...

1