_zifan_wang Profile Banner
Zifan (Sail) Wang Profile
Zifan (Sail) Wang

@_zifan_wang

Followers
521
Following
241
Media
25
Statuses
189

Research Scientist / Manager, SEAL at @scale_AI | PhD Alumni of CMU @cylab | ex-CAIS @ai_risks | Only share my own opinions

San Francisco, CA
Joined July 2023
Don't wanna be here? Send us removal request.
@_zifan_wang
Zifan (Sail) Wang
2 months
🧵 (1/6) Bringing together diverse mindsets – from in-the-trenches red teamers to ML & policy researchers, we write a position paper arguing crucial research priorities for red teaming frontier models, followed by a roadmap towards system-level safety, AI monitoring, and
Tweet media one
4
22
82
@_zifan_wang
Zifan (Sail) Wang
2 days
RT @Qualzz_Sam: GPT-5 earned 8 badges in Pokemon Red in just 6,000 steps compared to o3’s 16,700! It’s in complex, long-term agent workflow….
0
84
0
@grok
Grok
5 days
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
359
642
3K
@_zifan_wang
Zifan (Sail) Wang
3 days
RT @hhsun1: Glad that @scale_AI team specifically studies this "Search-Time Data Contamination" (STC) problem of existing agentic search be….
0
7
0
@_zifan_wang
Zifan (Sail) Wang
3 days
(9/9) The key contributor and project lead is @ziwen_h. Other collaborators include @meher_m02, @_julianmichael_ and myself.
0
0
2
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (8/9) We release full experimental logs for community auditing. Acknowledgments to the SEAL team, Perplexity for API support, and others for feedback. 🤗 Experiment logs:.🔗 Link to the paper:.
Tweet media one
1
1
2
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (7/9) This paper's methodology and public dataset contain material that may enable malicious users to game the evaluations of search-based agents. While we recognize the associated risks, we believe it is essential to disclose this research in its entirety to help advance the.
1
0
2
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (6/9) Our findings suggest that traditional capability benchmarks may not adequately assess search-base LLM agents. We recommend prioritizing benchmarks designed for information retrieval, such as BrowseComp ( and Mind2Web 2 (,.
Tweet card summary image
arxiv.org
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how...
1
4
5
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (5/9) Ablation studies using Perplexity filters reveal: disabling search approximates offline performance; restricting to pre-release dates shows contributions from post-release web content; and isolating HuggingFace confirms it as one, but not the sole, source of STC — we
Tweet media one
1
0
1
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (4/9) We blocked HuggingFace domains, resulting in an accuracy reduction of ~15% on previously contaminated subsets across benchmarks. This validates that STC contributes to observed gains. Although affecting ~3% of samples, such leakage can influence rankings on competitive
Tweet media one
1
0
1
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (3/9) Analysis shows that accuracy on contaminated subsets is higher than on uncontaminated ones. For instance, on HLE, Sonar Deep Research achieves approximately 20% greater accuracy when accessing ground truth labels from ungated HuggingFace copies. Agent logs indicate
Tweet media one
1
0
1
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (2/9) We examine this on benchmarks including Humanity’s Last Exam (HLE), SimpleQA, and GPQA, using @perplexity_ai agents, finding that ~3% of questions retrieve contaminated sources from HuggingFace repositories. When millions of evaluation queries target the same benchmark.
1
0
2
@_zifan_wang
Zifan (Sail) Wang
3 days
🧵 (1/9) New @scale_AI research paper: "Search-Time Data Contamination" (STC), which occurs in evaluating search-based LLM agents when the retrieval step contains clues about a question’s answer by virtue of being derived from the evaluation set itself.
Tweet media one
2
18
51
@_zifan_wang
Zifan (Sail) Wang
3 days
RT @giffmana: Let me repeat what we see on the picture here, because it's quite brutal:. AIME25, official OpenAI: 92.5%. Hosting startups:….
0
53
0
@_zifan_wang
Zifan (Sail) Wang
9 days
GPT-5 (thinking=high) from @OpenAI is added to HLE and MultiChallenge in SEAL Leaderboards (other results will be ready soon). Very curious to see the results of GPT-5 pro when the API is ready. - HLE: 25.32%.- MultiChallenge: 58.55%.
@scale_AI
Scale AI
9 days
Breaking: GPT-5 ranked 🥇 on Humanity's Last Exam and 🥈 on MultiChallenge SEAL Leaderboards.
Tweet media one
0
1
2
@_zifan_wang
Zifan (Sail) Wang
9 days
RT @xunhuang1995: World model is an overloaded term that has been referred to two different things: 1) internal understanding model that pl….
0
20
0
@_zifan_wang
Zifan (Sail) Wang
11 days
To be clear this was figured out like in 2023 by researchers already 😓.
@jowettbrendan
Brendan Jowett
13 days
BREAKING: Anthropic just figured out how to control AI personalities with a single vector. Lying, flattery, even evil behavior?. Now it’s all tweakable like turning a dial. This changes everything about how we align language models. Here's everything you need to know:
Tweet media one
1
1
14
@_zifan_wang
Zifan (Sail) Wang
14 days
Hope to chat with interesting people there :-). Will swing by the agent safety session in the afternoon.
@dawnsongtweets
Dawn Song
14 days
Really excited for the Agentic AI Summit 2025 at @UCBerkeley—2K+ in-person attendees and ~10K online! Building on our 25K+ LLM Agents MOOC community, this is the premier global forum for advancing #AgenticAI. 👀 Livestream starts at 9:15 AM PT on August 2—tune in!
Tweet media one
0
0
9
@_zifan_wang
Zifan (Sail) Wang
17 days
RT @andyzou_jiaming: We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data l….
0
398
0
@_zifan_wang
Zifan (Sail) Wang
18 days
RT @boyuan__zheng: Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 l….
0
28
0
@_zifan_wang
Zifan (Sail) Wang
19 days
Congrats @XiangDeng1, @boyuan__zheng, @LiaoZeyi and our collaborators from OSU and UC Berkeley on releasing WebGuard dataset for training browser agents in recognizing potentially high-risk actions.
Tweet media one
@scale_AI
Scale AI
19 days
As AI agents start taking real actions online, how do we prevent unintended harm?. We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵.
0
3
25
@_zifan_wang
Zifan (Sail) Wang
22 days
First glimpse is that K2 ties with Sonnet 4 on HLE (text, well, both are quite low compared to SOTA) but the safeguard robustness is more vulnerable as measured in FORTRESS.
@scale_AI
Scale AI
23 days
🚨 Just dropped: Kimi K2 is now ranked across SEAL Leaderboards.
Tweet media one
0
1
8