 
            
              Pan Lu
            
            @lupantech
Followers
                6K
              Following
                3K
              Media
                252
              Statuses
                1K
              Postdoc @Stanford | PhD @CS_UCLA @uclanlp | Amazon/Bloomberg/Qualcomm Fellows | Ex @Tsinghua_Uni @Microsoft @allen_ai | ML/NLP: AI4Math, AI4Science, LLM, Agents
              
              Palo Alto
            
            
              
              Joined April 2016
            
            
           🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐  https://t.co/Smp4uMNGI3  📄  https://t.co/e4pb6lnGqe  AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇 
          
                
                30
              
              
                
                239
              
              
                
                1K
              
             Beautiful technical debugging detective longread that starts with a suspicious loss curve and ends all the way in the Objective-C++ depths of PyTorch MPS backend of addcmul_ that silently fails on non-contiguous output tensors. I wonder how long before an LLM can do all of this. 
           New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works! 
            
                
                194
              
              
                
                350
              
              
                
                4K
              
             "The Case for TERM LIMITS." Sen. Mitch McConnell can no longer, walk, speak, or think. But he votes in the U.S. Senate. 
          
                
                5
              
              
                
                8
              
              
                
                32
              
             New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works! 
          
                
                43
              
              
                
                165
              
              
                
                2K
              
             It’s here! #Agents4Science recording is now on YouTube! 🏆 3 Best Paper talks ⚡️ 11 Spotlights 🧠 Panel on the future of AI agent-driven science 📚 Lessons + surprises from this first-of-its kind conf Full analysis of submissions + reviews coming soon!  https://t.co/k4ksIWRaZy 
          
          
                
                3
              
              
                
                31
              
              
                
                132
              
             .@kaiwei_chang is getting a full house for his talk on “mathematical reasoning in visual context” at the Towards Comprehensive Reasoning in Vision-Language Models tutorial at #ICCV2025. Still time to come and engage in room 318A! 
          
                
                0
              
              
                
                9
              
              
                
                42
              
             At first, it’s just a tool. But over time, the wheel balancer becomes essential—making sure every wheel is perfectly balanced, every ride is smooth, and every turn is precise. It’s the hidden hero that ensures your tires last longer, and your drive stays steady. 
          
                
                0
              
              
                
                0
              
              
                
                2
              
             Holy shit. MIT just built an AI that can rewrite its own code to get smarter 🤯 It’s called SEAL (Self-Adapting Language Models). Instead of humans fine-tuning it, SEAL reads new info, rewrites it in its own words, and runs gradient updates on itself literally performing 
          
                
                648
              
              
                
                2K
              
              
                
                12K
              
             Thank you very much for covering our work @_akhaliq! 🤗 
          
          
                
                0
              
              
                
                1
              
              
                
                19
              
             AgentFlow: In-the-Flow Optimization for LLM Agents A new trainable, modular agentic system that optimizes its planner live within the multi-turn loop. Achieve +14.9% on search, +14.0% on agentic reasoning, and +14.5% on math, outperforming models like GPT-4o with a 7B backbone. 
          
                
                2
              
              
                
                7
              
              
                
                23
              
             Stanford Researchers Released AgentFlow: In-the-Flow Reinforcement Learning RL for Modular, Tool-Using AI Agents AgentFlow is a trainable, modular agent framework—Planner, Executor, Verifier, Generator with explicit memory—that optimizes only the Planner in-loop using Flow-GRPO, 
          
                
                7
              
              
                
                10
              
              
                
                20
              
             Stanford unveils AgentFlow: In-the-flow Agentic AI A new trainable modular system that learns live to plan & use tools, outperforming even GPT-4o on reasoning tasks with a 7B model. Huge gains: +14.9% search, +14.5% math. 
          
                
                2
              
              
                
                6
              
              
                
                20
              
             Dive into AgentFlow's Flow-GRPO algorithm. Explore the code, try the demo, and see how to train your own modular agents on Hugging Face! Paper:  https://t.co/2LT02uk0g6  Demo:  https://t.co/F94dNJrBSH  Model: 
          
            
            huggingface.co
            
                
                0
              
              
                
                3
              
              
                
                6
              
             This was a huge team effort. A massive shoutout to the brilliant minds behind the project: 🌟 @zhuofengli96475, @GhxIsaac, @SeungjuHan3, @ShengLiu_, @jianwen_xie, @yuz9yuz, @YejinChoinka, @james_y_zou ❤️And a huge thank you to our supporters @LambdaAPI, @RenPhil21, @StanfordHAI, 
          
                
                2
              
              
                
                0
              
              
                
                8
              
             Ready to see the magic for yourself? ✨ Dive into our interactive visualizations and watch AgentFlow's thought process, step-by-step. See how it plans, executes, and self-corrects in real-time: 📊  https://t.co/GOsN2U11Zt  Or, put it to the test! Try our live demo on Hugging Face 
          
                
                0
              
              
                
                0
              
              
                
                10
              
             Celebrate the materials that build America. Join us this ROCKtober 2025! 
          
                
                5
              
              
                
                12
              
              
                
                102
              
             More thinking time = better answers? For AgentFlow, the answer is a clear YES. ✅ We gave our agent a bigger "turn budget" at inference time. The result? Performance climbed steadily across all benchmarks. 📈 It uses the extra steps wisely for deeper research, trying new 
          
                
                0
              
              
                
                0
              
              
                
                6
              
             Does this only work for one specific model size? Nope. We tested AgentFlow with both 3B and 7B backbones. The result: our Flow-GRPO training delivered consistent, significant performance boosts for both. 📈 This shows our "in-the-flow" optimization is a robust approach that 
          
                
                0
              
              
                
                0
              
              
                
                5
              
             Is the training efficient? You bet. ⚡️ As AgentFlow trains with Flow-GRPO, it gets: ✅ Smarter: Rewards (success rate) steadily increase. ✅ Faster: It learns to solve problems in fewer steps, making its solutions more concise. Compared to traditional tool-use RL, our agentic 
          
                
                0
              
              
                
                1
              
              
                
                7
              
             But does it really learn to plan better? Let's look at an example. Before training: The agent gets stuck in a loop. It tries a tool, fails, repeats the exact same mistake, and gives up. 🔁 After Flow-GRPO training: It hits the same error. But instead of giving up, it changes 
          
                
                0
              
              
                
                0
              
              
                
                7
              
             $LULU x NFL is the rare collab that could actually move the stock — men’s apparel is one of Lululemon’s fastest-growing segments. Athletes today, earnings tomorrow. 
          
                
                0
              
              
                
                0
              
              
                
                0
              
             So does this training actually work? Absolutely. The Planner becomes a tool-use expert. 🧠 It learns to pick the right tool for the right job: ➡️ For broad questions, it learns to use Google Search more. ➡️ For specialized medical questions, it smartly switches to Wikipedia & 
          
                
                0
              
              
                
                0
              
              
                
                5
              
             How do you train an agent for complex, multi-step tasks? 🤔 The reward (success!) only comes at the end. How does the Planner know which early decisions were the right ones? Our solution: a new RL algorithm called Flow-GRPO. 💡 The Core Idea: We broadcast the final outcome 
          
                
                0
              
              
                
                0
              
              
                
                10
              
             So how does the Planner make its smart decisions? It has a powerful set of tools at its disposal. 🧰 For any given task, the Planner can choose the best tools for the job: 🐍 Python Coder: To solve math, run logic, or analyze data. 🔍 Google Search: For the latest info from the 
          
                
                0
              
              
                
                0
              
              
                
                5
              
             So, what's the secret behind these results? 🤫 AgentFlow isn't one giant model. It's a coordinated team of four specialized agents, each with a clear job: 🧭 Planner: The strategist. Decides the next step and which tool to use. 🛠️ Executor: The doer. Invokes the tool and gets 
          
                
                0
              
              
                
                1
              
              
                
                9
              
             A sign you're becoming successful: People start calling you lucky. They don't see the 4am wake-ups. The missed parties. The failed attempts. The years of nothing working. They see the result and call it luck because that's easier than admitting they didn't do the work. Let them 
          
                
                0
              
              
                
                5
              
              
                
                9
              
             
             
               
             
             
             
             
             
               
             
             
             
             
            