Yu Su
@ysu_nlp
Followers
12K
Following
4K
Media
137
Statuses
2K
something new | prof. @osunlp | sloan fellow | intelligence and agents | author of Mind2Web, SeeAct, MMMU, HippoRAG, BioCLIP, UGround.
Columbus, OH
Joined March 2013
Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. https://t.co/6fZfTdx710 Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity
9
65
207
I don't necessarily agree with everything Microsoft is doing in the AI space, but man, Satya is such a fantastic CEO: sharp, grounded, long-termist, and still super easy to be around with. I should have held my MSFT for longer.
.@satyanadella gave me and @dylan522p an exclusive tour of Fairwater 2, the most powerful AI datacenter in the world. We then chatted through Satya's vision for Microsoft in a world with AGI. 0:00:00 - Fairwater 2 0:04:15 - Business models for AGI 0:13:42 - Copilot 0:20:56 -
1
2
17
Excited for this week's AI Agent Frontier Seminar! We're thrilled to host Dr. Huan Sun (@hhsun1 ) from @OhioState. She'll discuss a critical topic: "Advancing the Capability and Safety of Computer-Use Agents Together." 🗓️ Friday, Nov 14 ⏰ 9 AM PT / 12 PM ET All are welcome!
0
2
13
In my ICLR SAC batch (225 papers), only 18 paper (8%) have an avg initial rating >= 6 (borderline accept). Maybe my batch is not the most representative, or maybe it's reviewer fatigue from the exploding # of submissions. Silver lining is, don't give up on rebuttal.
11
12
238
🚀 Worried about faculty openings? Ohio State @OhioState is to hire 100 new faculty with AI expertise over the next five years! 🤖🎓 The new hires will join one of three AI Faculty Cohorts: 🧠 Foundational AI — Elevating the theoretical, mathematical, and algorithmic
8
42
211
Super fun to serve on LEAP's academic advisory board and observe the forecasting. If the development of AI can live up to the median forecasts, it will already be a big deal.
Today, we are launching the most rigorous ongoing source of expert forecasts on the future of AI: the Longitudinal Expert AI Panel (LEAP). We’ve assembled a panel of 339 top experts across computer science, AI industry, economics, and AI policy. Roughly every month—for the next
0
1
15
⛵Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and it’s even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:
19
83
558
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
173
1K
📢 As AI becomes increasingly explored for research idea generation, how can we rigorously evaluate the ideas it generates before committing time and resources to them? We introduce ScholarEval, a literature grounded framework for research idea evaluation across disciplines 👇!
4
42
141
The GOAT
2
4
117
Genuine question: how is @OpenAI Atlas different from @perplexity_ai Comet? at first glance the set of AI features looks quite similar
1
0
10
I am on the faculty job market this year! I am seeking tenure-track faculty positions to drive my research agenda on rigorous AI evaluation for science and policy. I am applying broadly across disciplines, and would be grateful to hear of relevant positions. Materials: 🧵
10
71
420
I've given 30+ talks on agents in the past 2 years, and I always end my talk with this slide. We are just at the dawn of a long journey on agents. General agents need the same broad set of cognitive competencies as humans and more. It doesn't necessarily have to be constructed
> @karpathy is right on the limitations of current agents > 2025 is year 0 of the decade of agents: agents started to bring significant marginal value in narrow domains (eg, coding, customer service), but still far from general human-level competency > agents need a wholesale
6
13
131
> @karpathy is right on the limitations of current agents > 2025 is year 0 of the decade of agents: agents started to bring significant marginal value in narrow domains (eg, coding, customer service), but still far from general human-level competency > agents need a wholesale
BREAKING: Andrej Karpathy calls out Sam Altman Altman: >"We are now confident we know how to build AGI" >"2025 is the year of AI agents" Karpathy: >"I was triggered by that over-prediction" >"More accurately, it's the decade of agents" >"There's SO much work to be done" Hmm
3
5
69
Excited to introduce a new agent learning paradigm called Early Experience, as a reward-free mid-training stage for large-scale agent training. A fantastic collaboration between Meta Superintelligence Lab and @osunlp led by the amazing @KaiZhang_CS. Built on insights from our
arxiv.org
Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g.,...
🌀Agent Learning via Early Experience🌀 📝: https://t.co/VsqQHTTrBN - SFT for agents is sparse; RL on long-horizons is hard We provide new mid-training signals that work: 1) Implicit next state world modeling task 2) Self-reflection on alternate states - Strong improvements over
5
29
175
Introducing early experience: using future states resulting from agent’s own action as scalable supervision to train itself - without reward🧠! 1️⃣Reward-free: can train directly in real-world environments. 2️⃣Better RL warm-start: when continued with RL, leads to higher final
🌀Agent Learning via Early Experience🌀 📝: https://t.co/VsqQHTTrBN - SFT for agents is sparse; RL on long-horizons is hard We provide new mid-training signals that work: 1) Implicit next state world modeling task 2) Self-reflection on alternate states - Strong improvements over
2
28
109
Really excited to announce AgentX–AgentBeats Competition 🚀 💰 $1 Million+ in prizes, cloud credits, and API resources, a global challenge hosted by @BerkeleyRDI , building on the Agentic AI MOOC community of 32K+ learners, bringing together builders, researchers, engineers, and
11
28
84
The most comprehensive agent evaluation to date
📣New paper: Rigorous AI agent evaluation is much harder than it seems. For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks. Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9
1
13
131
New @GoogleResearch paper shows agents learn software skills by watching tutorials, converting them into action steps, and boosting task performance. So converts free videos into reliable supervision at scale. A vision model, inverse dynamics, predicts the action between 2
14
82
441