taoyds Profile Banner
Tao Yu Profile
Tao Yu

@taoyds

Followers
5K
Following
1K
Media
39
Statuses
450

@XLangNLP lab, asst. prof. @HKUniversity. author of OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor. prev. postdoc @uwnlp; phd @Yale.

Seattle
Joined March 2016
Don't wanna be here? Send us removal request.
@taoyds
Tao Yu
9 days
RT @FaZhou_998: 🐙Octothinker tech report is finally out!.We also release the 70B math-focusing mid-training dataset -- MegaMath-Web-Pro-Max….
0
17
0
@taoyds
Tao Yu
13 days
RT @lmarena_ai: Check out the latest Computer Agent Arena leaderboard!.
0
12
0
@taoyds
Tao Yu
18 days
RT @XLangNLP: 🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.🤔Which VLMs act better as computer use agents (CUAs)?. 1, Cla….
0
23
0
@taoyds
Tao Yu
1 month
RT @CaimingXiong: Graphical user interface (GUI) grounding, one of the two key abilities (Grounding & Planning) for Computer-use Agent (e.g….
0
34
0
@taoyds
Tao Yu
1 month
Try out Claude 4 on Computer Agent Arena!.
@XLangNLP
XLANG NLP Lab
1 month
💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great release!
Tweet media one
0
0
3
@taoyds
Tao Yu
1 month
RT @XLangNLP: 💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great….
0
4
0
@taoyds
Tao Yu
1 month
RT @_akhaliq: Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Tweet media one
0
33
0
@taoyds
Tao Yu
1 month
Big congrats, Wei-Lin!.
@infwinston
Wei-Lin Chiang
1 month
Thrilled to announce our funding round — led by @a16z, @UofCalifornia, and a strong group of backers to grow @lmarena_ai!. We're building the infrastructure for open, reliable AI evaluation, and your feedback drives us forward. Try out our new UI today. More updates coming soon!.
0
0
2
@taoyds
Tao Yu
2 months
RT @workshopcua: We're excited to invite Victor Zhong (@hllo_wrld) as a speaker at the workshop on Computer Use Agents - @icmlconf 2025! 🤖….
0
3
0
@taoyds
Tao Yu
2 months
RT @ysu_nlp: New AI/LLM Agents Track at #EMNLP2025! . In the past few years, it feels a bit odd to submit agent work to *CL venues because….
0
22
0
@taoyds
Tao Yu
2 months
RT @Diyi_Yang: 🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants. A new benchmark for evaluating end-to-end, speech-in-….
0
32
0
@taoyds
Tao Yu
2 months
🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsourced real-world tasks. OSWorld: 🥇UI-Tars1.5🥈Operator🥉Claude 3.7.CUA Arena: 🥇Claude 3.7🥈Operator🥉UI-Tars1.5. 🚀Rankings likely to evolve quickly
Tweet media one
@XLangNLP
XLANG NLP Lab
2 months
🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks!. Check the full rankings! 👉
Tweet media one
0
11
36
@taoyds
Tao Yu
2 months
RT @BowenWangNLP: 😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study:. 1,….
0
5
0
@taoyds
Tao Yu
2 months
RT @lmarena_ai: Check out the latest release from Computer Agent Arena!.
0
12
0
@taoyds
Tao Yu
2 months
RT @XLangNLP: 🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenA….
0
23
0
@taoyds
Tao Yu
2 months
RT @Alibaba_Qwen: Introducing Qwen3! . We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 den….
0
2K
0
@taoyds
Tao Yu
2 months
RT @DL4Code: DL4C is going wild with @taoyds 's talk on multimodal code gen and @xingyaow_ 's talk on OpenHands agents. #ICLR #ICLR2025 ht….
0
5
0
@taoyds
Tao Yu
2 months
Computer use often involves long contexts, and users frequently tweak or follow up on requests. Though Claude 3.7/Operator aren’t perfect, this example shows their engaging and instruction-following abilities are growing (see the arena example):
@XLangNLP
XLANG NLP Lab
2 months
🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote, and explore their full potential with CUAs at Join the community and dive in!
Tweet media one
1
5
15
@taoyds
Tao Yu
2 months
RT @XLangNLP: 🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote,….
0
4
0
@taoyds
Tao Yu
2 months
👉Try UI-TARS-1.5 and more other computer use agents (Operator, Claude 3.7) at .
@XLangNLP
XLANG NLP Lab
2 months
🎉 UI-TARS-1.5 is now live on Computer Agent Arena! . Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to
Tweet media one
0
0
1