ysu_nlp Profile Banner
Yu Su (hiring postdoc) Profile
Yu Su (hiring postdoc)

@ysu_nlp

Followers
11K
Following
4K
Media
132
Statuses
2K

cooking something new. prof. @osunlp. sloan fellow. intelligence and agents. author of Mind2Web, SeeAct, MMMU, HippoRAG, BioCLIP, UGround.

Columbus, OH
Joined March 2013
Don't wanna be here? Send us removal request.
@ysu_nlp
Yu Su (hiring postdoc)
9 months
Sharing the slides of my talk at Princeton yesterday--"A holistic and critical look at language agents":. LLM-based language agents are exciting, but it's also undeniably a quite chaotic space: are agents the next big thing, or are they just thin wrappers
Tweet media one
16
123
516
@ysu_nlp
Yu Su (hiring postdoc)
6 days
A new high of gaming google scholar.
@MishaTeplitskiy
Science of Science (mostly on LinkedIn now)
6 days
You: should I cite my paper? Maybe it's not related enough? Ok, screw it, I'll self-cite but only once. Other scientists: With AI, I can cite myself 1.3M times šŸ˜Ž
Tweet media one
1
0
7
@ysu_nlp
Yu Su (hiring postdoc)
8 days
Excited to receive the NSF CAREER Award! . Grateful for all the support and encouragement I've received in the 6 years of faculty life so far, especially for my extremely supportive family and for the amazing students @osunlp I have had the privilege to work with!!
Tweet media one
23
11
254
@ysu_nlp
Yu Su (hiring postdoc)
10 days
RT @_zifan_wang: 🧵 (1/9) New @scale_AI research paper: "Search-Time Data Contamination" (STC), which occurs in evaluating search-based LLM….
0
19
0
@ysu_nlp
Yu Su (hiring postdoc)
10 days
RT @_zifan_wang: 🧵 (6/9) Our findings suggest that traditional capability benchmarks may not adequately assess search-base LLM agents. We r….
Tweet card summary image
arxiv.org
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how...
0
5
0
@ysu_nlp
Yu Su (hiring postdoc)
12 days
Even though benchmarks are becoming less relevant, I must say this is a very impressive set of results and the cooler way to flex about benchmark numbers.
@Zai_org
Z.ai
12 days
Introducing GLM-4.5V: a breakthrough in open-source visual reasoning. GLM-4.5V delivers state-of-the-art performance among open-source models in its size class, dominating across 41 benchmarks. Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from
Tweet media one
3
2
14
@ysu_nlp
Yu Su (hiring postdoc)
15 days
Excited to partner with the Princeton team on Holistic Agent Leaderboard!. Claude continues to be the best choice for agent tasks, but overall we still have a long way to go as a field.
@sayashk
Sayash Kapoor
15 days
How does GPT-5 compare against Claude Opus 4.1 on agentic tasks? . Since their release, we have been evaluating these models on challenging science, web, service, and code tasks. Headline result: While cost-effective, so far GPT-5 never tops agentic leaderboards. More evals 🧵
Tweet media one
1
4
20
@ysu_nlp
Yu Su (hiring postdoc)
15 days
RT @xiangyue96: 20 months after our multimodal reasoning benchmark MMMU ( release, both frontier and open models ar….
0
20
0
@ysu_nlp
Yu Su (hiring postdoc)
16 days
It seems that the GPT-5 upgrade just broke ChatGPT Agent: .- Agent mode is not supported by any 5 model .- There seems no way to select an earlier model.
Tweet media one
5
1
16
@ysu_nlp
Yu Su (hiring postdoc)
16 days
OpenAI: Feel the AGI šŸš€.Everyone else: biggest chart crime in history šŸ™…ā€ā™‚ļø. Vagueposting does backfire sometimes.
0
0
9
@ysu_nlp
Yu Su (hiring postdoc)
16 days
Congrats @OpenAI on the MMMU improvement!. Also, @AnthropicAI should take notes here for the art of making bar charts.
Tweet media one
7
1
23
@ysu_nlp
Yu Su (hiring postdoc)
17 days
Hmm, looks kind of familiar: . LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error - ACL’24.
Tweet card summary image
arxiv.org
Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily...
@corbtt
Kyle Corbitt
17 days
Announcing MCP•RL: teach your model how to use any MCP server automatically using reinforcement learning!. Just connect any MCP server, and your model will start playing with it and (using RL) "learn from experience" how to use its tools most effectively!
Tweet media one
4
3
24
@ysu_nlp
Yu Su (hiring postdoc)
18 days
šŸ¤”I thought @windsurf's Claude access was revoked?
Tweet media one
@AnthropicAI
Anthropic
18 days
Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.
Tweet media one
0
0
3
@ysu_nlp
Yu Su (hiring postdoc)
18 days
šŸ˜: gpt-oss.🤨: no multimodal. wondering why. multimodality has become the default and would enable so many more applications.
@OpenAI
OpenAI
18 days
Our open models are here. Both of them.
7
2
35
@ysu_nlp
Yu Su (hiring postdoc)
18 days
RT @BangL93: šŸ¤–Check The Hitchhiker’s Guide to Agents HEREšŸ¤–. Our Foundation Agents Survey V2 level up to 396 pages – every chapter is a full….
0
39
0
@ysu_nlp
Yu Su (hiring postdoc)
24 days
RT @nouhadziri: Come join us tomorrow at the 1st LLM agents workshop in ACL (REALM); amazing talks and oral presentations are ahead, with a….
0
18
0
@ysu_nlp
Yu Su (hiring postdoc)
26 days
RT @boyuan__zheng: Remember ā€œSon of Antonā€ from the Silicon Valley show(@SiliconHBO)? The experimental AI that ā€œefficientlyā€ orders 4,000 l….
0
29
0
@ysu_nlp
Yu Su (hiring postdoc)
26 days
RT @vardaanpahuja: šŸš€ Excited to share our #ACL2025 Findings paper:.Explorer — a scalable pipeline that generates diverse web trajectories v….
0
13
0
@ysu_nlp
Yu Su (hiring postdoc)
26 days
Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something consequential without my permission? . We collect and release the first large-scale dataset for detecting consequential actions on the web, and train the best
Tweet media one
@scale_AI
Scale AI
26 days
As AI agents start taking real actions online, how do we prevent unintended harm?. We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵.
0
20
100
@ysu_nlp
Yu Su (hiring postdoc)
26 days
RT @Zai_org: Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilitie….
0
645
0
@ysu_nlp
Yu Su (hiring postdoc)
1 month
RT @vimar_gu: Announcing the @NeurIPSConf 2025 workshop on Imageomics:.Discovering Biological Knowledge from Images Using AI!. The workshop….
0
15
0