Pengfei Liu @stefan_fee X Profile

Pengfei Liu

@stefan_fee

Followers

4K

Following

2K

Media

120

Statuses

481

Associate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,

https://t.co/kALTIqdPps

Pittsburgh

Joined September 2014

Don't wanna be here? Send us removal request.

Pengfei Liu

@stefan_fee

1 year

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by @AIatMeta: https://t.co/8Kxm9Hanf4

9

103

528

Pengfei Liu

@stefan_fee

7 days

Thank you so much for this thoughtful reflection! We're thrilled that our paper sparked this kind of deep thinking—that's exactly why we wrote it. By introducing more context, we hope to help people see the more fundamental reasons behind AI's evolution. We're doing entropy

James R. Lee

@leeknowsAI

8 days

I’ve been deep in AI for years, but every once in a while, something makes me stop and rethink everything I thought I understood. That’s what happened when I read GAIR’s new paper, Context Engineering 2.0. It completely flips the idea of “prompt engineering.” For the last two

2

1

11

Mayank Vora

@aiwithmayank

8 days

Prompt Engineering is dead. The GAIR team just dropped Context Engineering 2.0 and it completely reframes how we think about human–AI interaction. Forget prompts. Forget few-shot. Context is the real interface. Their core idea: “A person is the sum of their contexts.”

32

53

205

elvis

@omarsar0

8 days

Context Engineering 2.0 This report discusses the context of context engineering and examines key design considerations for its practice. Explosion of intelligence will lead to greater context-processing capabilities, so it's important to build for the future too. This aligns

30

146

744

Pengfei Liu

@stefan_fee

8 days

First principle of Context Engineering: Human-Machine Intelligence Gap — Humans naturally "fill in the blanks," machines don't. Context Engineering is fundamentally about entropy reduction, translating high-entropy human intent into machine-understandable signals. Every

Robert Youssef

@rryssf_

9 days

🚨 RIP “Prompt Engineering.” The GAIR team just dropped Context Engineering 2.0 — and it completely reframes how we think about human–AI interaction. Forget prompts. Forget “few-shot.” Context is the real interface. Here’s the core idea: “A person is the sum of their

1

5

19

Pengfei Liu

@stefan_fee

8 days

Sharing our recent work: Context Engineering 2.0: The Context of Context Engineering. link: https://t.co/ocRIAGsRgo Our key insights: 1. Humans are the sum of all contexts — When an employee leaves but their work context (decision patterns, emails, workflows) persists in AI,

LyulyuYe

@ylmnshn1

9 days

1/6 🫡 We’ve been talking about Context Engineering all wrong. A new paper, Context Engineering 2.0: The Context of Context Engineering, reveals the missing blueprint for context engineering. It’s not just a recent innovation of the agent era: in fact, Context Engineering can be

1

3

9

Steffi Chern

@steffichern

1 month

come see us at poster 15 🥰

Graham Neubig

@gneubig

1 month

Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU

0

2

64

Graham Neubig

@gneubig

1 month

Now @steffichern and I are presenting about how to do better factuality detection with tools! https://t.co/vvH6EBbLBU

1

37

Daniel Fried

@dan_fried

1 month

At this morning's #colm2025 poster session, come see @YiqingXieNLP present a method for scalable construction of coding environments from GitHub repos. Poster 76 Paper: https://t.co/k8UPBg5zh6 Work with Alex Xie, @Divyanshu_Sheth, @stefan_fee, and @carolynprose

arxiv.org

We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim...

1

7

41

Rohan Paul

@rohanpaul_ai

2 months

BIG claim. Giving an LLM just 78 carefully chosen, full workflow examples makes it perform better at real agent tasks than training it with 10,000 synthetic samples. "Dramatically outperforms SOTA models: Kimi-K2-Instruct, DeepSeek-V3.1, Qwen3-235B-A22B-Instruct and GLM-4.5. "

35

194

1K

Pengfei Liu

@stefan_fee

2 months

like this comment: "Drop the coding benchmarks. That's everything useful nowadays. SWE Bench Pro for instance". That's why we also create a longer task bench with real-world value in this work: https://t.co/uBDhN2Zrrw

Aran Komatsuzaki

@arankomatsuzaki

2 months

LIMI: Less Is More for Agency • Argues agentic AI doesn’t need more data, just better data • 78 curated demos → 73.5% on AgencyBench (beats models trained on 10k samples) • Outperforms SOTA (Kimi-K2: 24.1%, DeepSeek: 11.9%, Qwen3: 27.5%, GLM-4.5: 45.1%) • Establishes Agency

0

8

Pengfei Liu

@stefan_fee

2 months

🚀 LLMs are entering the era of "agency" - massive opportunities ahead across data, models & products. But what truly drives agentic intelligence? Our LIMI research: +14.1% over GPT-5 with just 78 samples 🔥 The insight? Long-horizon tasks won't be hard to solve - if we can

Yang Xiao

@Yang_Xiao_nlp

2 months

1/9 🔥 NEW PAPER: "LIMI: Less is More for Agency" The Age of AI Agency demands systems that don't just think, but work: vibe coding and automated research. We used just 78 samples to beat GPT-5 by 14.1% and discovered the Agency Efficiency Principle. See details below! 📊

0

8

Yujia Qin

@TsingYoga

2 months

We can finally share UI-TARS-2🥳🥳 — a native GUI agent trained with multi-turn agent RL ⚡️⚡️Key highlights (all-in-one model!): 💻Computer Use: 47.5 OSWorld · 50.6 WindowsAgentArena 📱Phone Use: 73.3 AndroidWorld 🛜Browser Use: 88.2% Online-Mind2Web 🎮Gameplay: ~60% human

12

53

300

Yiqing Xie

@YiqingXieNLP

4 months

RepoST was accepted to @COLM_conf !!! See you in Montreal 🚀 #COLM2025

Yiqing Xie

@YiqingXieNLP

8 months

How to construct repo-level coding environments in a scalable way? Checkout RepoST: an automated framework to construct repo-level environments using Sandbox Testing ( https://t.co/jlLXPacQE9) Models trained with RepoST data can generalize well to other datasets (e.g., RepoEval)

0

3

17

Ethan Chern

@ethanchern

4 months

FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:

Ethan Chern

@ethanchern

2 years

In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: https://t.co/CE73PDhSP4 (1/n)

2

5

12

Beens

@dirctd_by_beens

4 months

blog - https://t.co/tfZ96pDqqq read 'octothinker' last week and it's so cool. great work by @SinclairWang1 @FaZhou_998 @stefan_fee

0

5

9

Pengfei Liu

@stefan_fee

4 months

Tech history: Every time humanity hits a tech wall, we just wait for someone named Ilya to show up and save the world :) - Neural nets stuck? - Language models plateau? - ... (skip tons of stuff) - ... - Superintelligence coming?

Jason Wei

@_jasonwei

4 months

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that

1

0

6

Pengfei Liu

@stefan_fee

5 months

What foundation models do we REALLY need for the RL era? And what pre-training data? Excited to share our work: OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling https://t.co/OJVj15x63W ✨ Key breakthroughs: - First RL-focused mid-training approach - Llama

Zengzhi Wang

@SinclairWang1

5 months

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?

0

10

77

Pengfei Liu

@stefan_fee

5 months

nice discussion

Fan Zhou

@FaZhou_998

5 months

🧵Interesting paper—great to see the emphasis on large token counts, which is always appreciated. 😅But some of the results are... puzzling. For example, Table 3 essentially suggests that MegaMath is a non-math corpus. This is weird, especially given the care we've taken during

0

5

Pengfei Liu

@stefan_fee

6 months

The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing

Adina Yakup

@AdinaYakup

6 months

📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. https://t.co/WNt0NdOm8d

0

1

5

Pengfei Liu

@stefan_fee

6 months

🦄 What’s next? We’re entering an era where AI doesn’t just perform better on image generation—it imagines, critiques, and evolves its own visual ideas, a fundamental advance in how AI reasons across modalities. Try the code and see what you can build!

1

0

3