Pengfei Liu
@stefan_fee
Followers
4K
Following
2K
Media
120
Statuses
481
Associate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,
Pittsburgh
Joined September 2014
The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: https://t.co/8Kxm9Hanf4
9
103
528
Thank you so much for this thoughtful reflection! We're thrilled that our paper sparked this kind of deep thinking—that's exactly why we wrote it. By introducing more context, we hope to help people see the more fundamental reasons behind AI's evolution. We're doing entropy
I’ve been deep in AI for years, but every once in a while, something makes me stop and rethink everything I thought I understood. That’s what happened when I read GAIR’s new paper, Context Engineering 2.0. It completely flips the idea of “prompt engineering.” For the last two
2
1
11
Prompt Engineering is dead. The GAIR team just dropped Context Engineering 2.0 and it completely reframes how we think about human–AI interaction. Forget prompts. Forget few-shot. Context is the real interface. Their core idea: “A person is the sum of their contexts.”
32
53
205
Context Engineering 2.0 This report discusses the context of context engineering and examines key design considerations for its practice. Explosion of intelligence will lead to greater context-processing capabilities, so it's important to build for the future too. This aligns
30
146
744
First principle of Context Engineering: Human-Machine Intelligence Gap — Humans naturally "fill in the blanks," machines don't. Context Engineering is fundamentally about entropy reduction, translating high-entropy human intent into machine-understandable signals. Every
🚨 RIP “Prompt Engineering.” The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction. Forget prompts. Forget “few-shot.” Context is the real interface. Here’s the core idea: “A person is the sum of their
1
5
19
Sharing our recent work: Context Engineering 2.0: The Context of Context Engineering. link: https://t.co/ocRIAGsRgo Our key insights: 1. Humans are the sum of all contexts — When an employee leaves but their work context (decision patterns, emails, workflows) persists in AI,
1/6 🫡 We’ve been talking about Context Engineering all wrong. A new paper, Context Engineering 2.0: The Context of Context Engineering, reveals the missing blueprint for context engineering. It’s not just a recent innovation of the agent era: in fact, Context Engineering can be
1
3
9
come see us at poster 15 🥰
Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU
0
2
64
Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU
1
1
37
At this morning's #colm2025 poster session, come see @YiqingXieNLP present a method for scalable construction of coding environments from GitHub repos. Poster 76 Paper: https://t.co/k8UPBg5zh6 Work with Alex Xie, @Divyanshu_Sheth, @stefan_fee, and @carolynprose
arxiv.org
We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim...
1
7
41
BIG claim. Giving an LLM just 78 carefully chosen, full workflow examples makes it perform better at real agent tasks than training it with 10,000 synthetic samples. "Dramatically outperforms SOTA models: Kimi-K2-Instruct, DeepSeek-V3.1, Qwen3-235B-A22B-Instruct and GLM-4.5. "
35
194
1K
like this comment: "Drop the coding benchmarks. That's everything useful nowadays. SWE Bench Pro for instance". That's why we also create a longer task bench with real-world value in this work: https://t.co/uBDhN2Zrrw
LIMI: Less Is More for Agency • Argues agentic AI doesn’t need more data, just better data • 78 curated demos → 73.5% on AgencyBench (beats models trained on 10k samples) • Outperforms SOTA (Kimi-K2: 24.1%, DeepSeek: 11.9%, Qwen3: 27.5%, GLM-4.5: 45.1%) • Establishes Agency
0
0
8
🚀 LLMs are entering the era of "agency" - massive opportunities ahead across data, models & products. But what truly drives agentic intelligence? Our LIMI research: +14.1% over GPT-5 with just 78 samples 🔥 The insight? Long-horizon tasks won't be hard to solve - if we can
1/9 🔥 NEW PAPER: "LIMI: Less is More for Agency" The Age of AI Agency demands systems that don't just think, but work: vibe coding and automated research. We used just 78 samples to beat GPT-5 by 14.1% and discovered the Agency Efficiency Principle. See details below! 📊
0
0
8
We can finally share UI-TARS-2🥳🥳 — a native GUI agent trained with multi-turn agent RL ⚡️⚡️Key highlights (all-in-one model!): 💻Computer Use: 47.5 OSWorld · 50.6 WindowsAgentArena 📱Phone Use: 73.3 AndroidWorld 🛜Browser Use: 88.2% Online-Mind2Web 🎮Gameplay: ~60% human
12
53
300
RepoST was accepted to @COLM_conf !!! See you in Montreal 🚀 #COLM2025
How to construct repo-level coding environments in a scalable way? Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing ( https://t.co/jlLXPacQE9) Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)
0
3
17
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)
2
5
12
blog - https://t.co/tfZ96pDqqq read 'octothinker' last week and it's so cool. great work by @SinclairWang1 @FaZhou_998 @stefan_fee
0
5
9
Tech history: Every time humanity hits a tech wall, we just wait for someone named Ilya to show up and save the world :) - Neural nets stuck? - Language models plateau? - ... (skip tons of stuff) - ... - Superintelligence coming?
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that
1
0
6
What foundation models do we REALLY need for the RL era? And what pre-training data? Excited to share our work: OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling https://t.co/OJVj15x63W ✨ Key breakthroughs: - First RL-focused mid-training approach - Llama
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?
0
10
77
The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing
📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. https://t.co/WNt0NdOm8d
0
1
5
🦄 What’s next? We’re entering an era where AI doesn’t just perform better on image generation—it imagines, critiques, and evolves its own visual ideas, a fundamental advance in how AI reasons across modalities. Try the code and see what you can build!
1
0
3