Huan Sun (OSU) Profile
Huan Sun (OSU)

@hhsun1

Followers
5K
Following
2K
Media
49
Statuses
720

Associate Professor (with Tenure) in CSE, endowed CoE Innovation Scholar, CoP Co-Director @OSUbigdata, The Ohio State University (NLP and Data Mining)

The Ohio State University
Joined March 2012
Don't wanna be here? Send us removal request.
@hhsun1
Huan Sun (OSU)
1 month
🚨 Postdoc Hiring:.I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp @osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,.
1
30
66
@hhsun1
Huan Sun (OSU)
22 hours
RT @jxwuyi: 🔍We introduce ASearcher, a search agent trained by end2end RL.Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon….
0
57
0
@hhsun1
Huan Sun (OSU)
2 days
More details about Mind2Web 2: .Explore examples here:
@ysu_nlp
Yu Su
2 months
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️. Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge.- 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor.-
Tweet media one
0
0
3
@hhsun1
Huan Sun (OSU)
2 days
RT @_zifan_wang: 🧵 (6/9) Our findings suggest that traditional capability benchmarks may not adequately assess search-base LLM agents. We r….
Tweet card summary image
arxiv.org
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how...
0
4
0
@hhsun1
Huan Sun (OSU)
2 days
Glad that @scale_AI team specifically studies this "Search-Time Data Contamination" (STC) problem of existing agentic search benchmarks, where an agent might just retrieve a source that directly leaks the answer - no actual reasoning or complex web navigation is needed. In.
@_zifan_wang
Zifan (Sail) Wang
2 days
🧵 (1/9) New @scale_AI research paper: "Search-Time Data Contamination" (STC), which occurs in evaluating search-based LLM agents when the retrieval step contains clues about a question’s answer by virtue of being derived from the evaluation set itself.
Tweet media one
3
5
17
@hhsun1
Huan Sun (OSU)
2 days
Glad that @scale_AI team specifically studies this "Search-Time Data Contamination" (STC) problem of existing agentic search benchmarks, where an agent might just retrieve a source that directly leaks the answer - no actual reasoning or complex web navigation is needed. In.
@_zifan_wang
Zifan (Sail) Wang
2 days
🧵 (1/9) New @scale_AI research paper: "Search-Time Data Contamination" (STC), which occurs in evaluating search-based LLM agents when the retrieval step contains clues about a question’s answer by virtue of being derived from the evaluation set itself.
Tweet media one
0
0
3
@hhsun1
Huan Sun (OSU)
5 days
RT @xiangyue96: 20 months after our multimodal reasoning benchmark MMMU ( release, both frontier and open models ar….
0
18
0
@hhsun1
Huan Sun (OSU)
6 days
RT @abeirami: The main ingredient that led to GRPO's performance leap is the calibration of the reward/value via multiple rollouts per prom….
0
45
0
@hhsun1
Huan Sun (OSU)
7 days
RT @sayashk: How does GPT-5 compare against Claude Opus 4.1 on agentic tasks? . Since their release, we have been evaluating these models o….
0
70
0
@hhsun1
Huan Sun (OSU)
8 days
RT @CaimingXiong: 🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explo….
0
41
0
@hhsun1
Huan Sun (OSU)
12 days
RT @AnthropicAI: New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously c….
0
198
0
@hhsun1
Huan Sun (OSU)
14 days
The System Two Safety idea is very interesting. In environments that may have maliciously injected instructions, deliberative reasoning seems particularly important: . Are the instructions in the current context out of place or suspicious, e.g., see Indirect Prompt Injection
Tweet media one
@ShaneLegg
Shane Legg
14 days
I'm a co-author on this new paper on Chain of Thought Monitoring:. It's very related to the System Two Safety idea that I've been talking about for a few years:.
1
0
8
@hhsun1
Huan Sun (OSU)
16 days
RT @AbrahamOwos: Yaaaay! our paper got a best social impact award at ACL!!!🎉🎉. I couldn't attend the conference sadly.
0
15
0
@hhsun1
Huan Sun (OSU)
16 days
RT @nouhadziri: Come join us tomorrow at the 1st LLM agents workshop in ACL (REALM); amazing talks and oral presentations are ahead, with a….
0
17
0
@hhsun1
Huan Sun (OSU)
18 days
RT @boyuan__zheng: Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 l….
0
28
0
@hhsun1
Huan Sun (OSU)
18 days
Check out our WebGuard led by @boyuan__zheng: the first large-scale dataset for training and evaluating guardrails to detect consequential web agent actions (actions with significant or irreversible consequences): 📊 4,939 human-labeled actions.📷 193 websites across 22 domains.
@boyuan__zheng
Boyuan Zheng@ICML
18 days
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code?. It’s starting to look a lot like reality. Even 18 months ago, my own
Tweet media one
0
3
20
@hhsun1
Huan Sun (OSU)
18 days
RT @scale_AI: WebGuard turns abstract safety into something measurable. Captured from 193 different websites across 22 diverse domains, it'….
0
3
0
@hhsun1
Huan Sun (OSU)
18 days
RT @scale_AI: As AI agents start taking real actions online, how do we prevent unintended harm?. We teamed up with @OhioState and @UCBerkel….
0
22
0
@hhsun1
Huan Sun (OSU)
18 days
RT @ysu_nlp: Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something conseq….
0
19
0
@hhsun1
Huan Sun (OSU)
18 days
RT @AnaisHowland18: WebGuard is a big step for web-agent safety: ~5k human-tagged actions across 193 sites. Frontier LLMs hit <60% on high-….
0
3
0
@hhsun1
Huan Sun (OSU)
20 days
RT @niloofar_mire: I’m gonna be recruiting students thru both @LTIatCMU (NLP) and @CMU_EPP (Engineering and Public Policy) for fall 2026!….
0
50
0