Pengfei Liu Profile
Pengfei Liu

@stefan_fee

Followers
4K
Following
2K
Media
120
Statuses
481

Associate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,

Pittsburgh
Joined September 2014
Don't wanna be here? Send us removal request.
@stefan_fee
Pengfei Liu
1 year
The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: https://t.co/8Kxm9Hanf4
9
103
528
@stefan_fee
Pengfei Liu
7 days
Thank you so much for this thoughtful reflection! We're thrilled that our paper sparked this kind of deep thinking—that's exactly why we wrote it. By introducing more context, we hope to help people see the more fundamental reasons behind AI's evolution. We're doing entropy
@leeknowsAI
James R. Lee
8 days
I’ve been deep in AI for years, but every once in a while, something makes me stop and rethink everything I thought I understood. That’s what happened when I read GAIR’s new paper, Context Engineering 2.0. It completely flips the idea of “prompt engineering.” For the last two
2
1
11
@aiwithmayank
Mayank Vora
8 days
Prompt Engineering is dead. The GAIR team just dropped Context Engineering 2.0 and it completely reframes how we think about human–AI interaction. Forget prompts. Forget few-shot. Context is the real interface. Their core idea: “A person is the sum of their contexts.”
32
53
205
@omarsar0
elvis
8 days
Context Engineering 2.0 This report discusses the context of context engineering and examines key design considerations for its practice. Explosion of intelligence will lead to greater context-processing capabilities, so it's important to build for the future too. This aligns
30
146
744
@stefan_fee
Pengfei Liu
8 days
First principle of Context Engineering: Human-Machine Intelligence Gap — Humans naturally "fill in the blanks," machines don't. Context Engineering is fundamentally about entropy reduction, translating high-entropy human intent into machine-understandable signals. Every
@rryssf_
Robert Youssef
9 days
🚨 RIP “Prompt Engineering.” The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction. Forget prompts. Forget “few-shot.” Context is the real interface. Here’s the core idea: “A person is the sum of their
1
5
19
@stefan_fee
Pengfei Liu
8 days
Sharing our recent work: Context Engineering 2.0: The Context of Context Engineering. link: https://t.co/ocRIAGsRgo Our key insights: 1. Humans are the sum of all contexts — When an employee leaves but their work context (decision patterns, emails, workflows) persists in AI,
@ylmnshn1
LyulyuYe
9 days
1/6 🫡 We’ve been talking about Context Engineering all wrong. A new paper, Context Engineering 2.0: The Context of Context Engineering, reveals the missing blueprint for context engineering. It’s not just a recent innovation of the agent era: in fact, Context Engineering can be
1
3
9
@steffichern
Steffi Chern
1 month
come see us at poster 15 🥰
@gneubig
Graham Neubig
1 month
Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU
0
2
64
@gneubig
Graham Neubig
1 month
Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU
1
1
37
@rohanpaul_ai
Rohan Paul
2 months
BIG claim. Giving an LLM just 78 carefully chosen, full workflow examples makes it perform better at real agent tasks than training it with 10,000 synthetic samples. "Dramatically outperforms SOTA models: Kimi-K2-Instruct, DeepSeek-V3.1, Qwen3-235B-A22B-Instruct and GLM-4.5. "
35
194
1K
@stefan_fee
Pengfei Liu
2 months
like this comment: "Drop the coding benchmarks. That's everything useful nowadays. SWE Bench Pro for instance". That's why we also create a longer task bench with real-world value in this work: https://t.co/uBDhN2Zrrw
@arankomatsuzaki
Aran Komatsuzaki
2 months
LIMI: Less Is More for Agency • Argues agentic AI doesn’t need more data, just better data • 78 curated demos → 73.5% on AgencyBench (beats models trained on 10k samples) • Outperforms SOTA (Kimi-K2: 24.1%, DeepSeek: 11.9%, Qwen3: 27.5%, GLM-4.5: 45.1%) • Establishes Agency
0
0
8
@stefan_fee
Pengfei Liu
2 months
🚀 LLMs are entering the era of "agency" - massive opportunities ahead across data, models & products. But what truly drives agentic intelligence? Our LIMI research: +14.1% over GPT-5 with just 78 samples 🔥 The insight? Long-horizon tasks won't be hard to solve - if we can
@Yang_Xiao_nlp
Yang Xiao
2 months
1/9 🔥 NEW PAPER: "LIMI: Less is More for Agency" The Age of AI Agency demands systems that don't just think, but work: vibe coding and automated research. We used just 78 samples to beat GPT-5 by 14.1% and discovered the Agency Efficiency Principle. See details below! 📊
0
0
8
@TsingYoga
Yujia Qin
2 months
We can finally share UI-TARS-2🥳🥳 — a native GUI agent trained with multi-turn agent RL ⚡️⚡️Key highlights (all-in-one model!): 💻Computer Use: 47.5 OSWorld · 50.6 WindowsAgentArena 📱Phone Use: 73.3 AndroidWorld 🛜Browser Use: 88.2% Online-Mind2Web 🎮Gameplay: ~60% human
12
53
300
@YiqingXieNLP
Yiqing Xie
4 months
RepoST was accepted to @COLM_conf !!! See you in Montreal 🚀 #COLM2025
@YiqingXieNLP
Yiqing Xie
8 months
How to construct repo-level coding environments in a scalable way? Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing ( https://t.co/jlLXPacQE9) Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)
0
3
17
@ethanchern
Ethan Chern
4 months
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:
@ethanchern
Ethan Chern
2 years
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)
2
5
12
@dirctd_by_beens
Beens
4 months
blog - https://t.co/tfZ96pDqqq read 'octothinker' last week and it's so cool. great work by @SinclairWang1 @FaZhou_998 @stefan_fee
0
5
9
@stefan_fee
Pengfei Liu
4 months
Tech history: Every time humanity hits a tech wall, we just wait for someone named Ilya to show up and save the world :) - Neural nets stuck? - Language models plateau? - ... (skip tons of stuff) - ... - Superintelligence coming?
@_jasonwei
Jason Wei
4 months
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that
1
0
6
@stefan_fee
Pengfei Liu
5 months
What foundation models do we REALLY need for the RL era? And what pre-training data? Excited to share our work: OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling https://t.co/OJVj15x63W ✨ Key breakthroughs: - First RL-focused mid-training approach - Llama
@SinclairWang1
Zengzhi Wang
5 months
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?
0
10
77
@stefan_fee
Pengfei Liu
5 months
nice discussion
@FaZhou_998
Fan Zhou
5 months
🧵Interesting paper—great to see the emphasis on large token counts, which is always appreciated. 😅But some of the results are... puzzling. For example, Table 3 essentially suggests that MegaMath is a non-math corpus. This is weird, especially given the care we've taken during
0
0
5
@stefan_fee
Pengfei Liu
6 months
The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing
@AdinaYakup
Adina Yakup
6 months
📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. https://t.co/WNt0NdOm8d
0
1
5
@stefan_fee
Pengfei Liu
6 months
🦄 What’s next? We’re entering an era where AI doesn’t just perform better on image generation—it imagines, critiques, and evolves its own visual ideas, a fundamental advance in how AI reasons across modalities. Try the code and see what you can build!
1
0
3