Tao Yu @taoyds X Profile

Tao Yu

@taoyds

Followers

5K

Following

1K

Media

39

Statuses

450

@XLangNLP lab, asst. prof. @HKUniversity. author of OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor. prev. postdoc @uwnlp; phd @Yale.

Seattle

Joined March 2016

Don't wanna be here? Send us removal request.

Tao Yu

@taoyds

9 days

RT @FaZhou_998: 🐙Octothinker tech report is finally out!.We also release the 70B math-focusing mid-training dataset -- MegaMath-Web-Pro-Max….

0

17

0

Tao Yu

@taoyds

13 days

RT @lmarena_ai: Check out the latest Computer Agent Arena leaderboard!.

0

12

0

Tao Yu

@taoyds

18 days

RT @XLangNLP: 🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.🤔Which VLMs act better as computer use agents (CUAs)?. 1, Cla….

0

23

0

Tao Yu

@taoyds

1 month

RT @CaimingXiong: Graphical user interface (GUI) grounding, one of the two key abilities (Grounding & Planning) for Computer-use Agent (e.g….

0

34

0

Tao Yu

@taoyds

1 month

Try out Claude 4 on Computer Agent Arena!.

XLANG NLP Lab

@XLangNLP

1 month

💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great release!

0

3

Tao Yu

@taoyds

1 month

RT @XLangNLP: 💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great….

0

4

0

Tao Yu

@taoyds

1 month

RT @_akhaliq: Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

0

33

0

Tao Yu

@taoyds

1 month

Big congrats, Wei-Lin!.

Wei-Lin Chiang

@infwinston

1 month

Thrilled to announce our funding round — led by @a16z, @UofCalifornia, and a strong group of backers to grow @lmarena_ai!. We're building the infrastructure for open, reliable AI evaluation, and your feedback drives us forward. Try out our new UI today. More updates coming soon!.

0

2

Tao Yu

@taoyds

2 months

RT @workshopcua: We're excited to invite Victor Zhong (@hllo_wrld) as a speaker at the workshop on Computer Use Agents - @icmlconf 2025! 🤖….

0

3

0

Tao Yu

@taoyds

2 months

RT @ysu_nlp: New AI/LLM Agents Track at #EMNLP2025! . In the past few years, it feels a bit odd to submit agent work to *CL venues because….

0

22

0

Tao Yu

@taoyds

2 months

RT @Diyi_Yang: 🚀 Introducing CAVA: The Comprehensive Assessment for Voice Assistants. A new benchmark for evaluating end-to-end, speech-in-….

0

32

0

Tao Yu

@taoyds

2 months

🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsourced real-world tasks. OSWorld: 🥇UI-Tars1.5🥈Operator🥉Claude 3.7.CUA Arena: 🥇Claude 3.7🥈Operator🥉UI-Tars1.5. 🚀Rankings likely to evolve quickly

XLANG NLP Lab

@XLangNLP

2 months

🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks!. Check the full rankings! 👉

0

11

36

Tao Yu

@taoyds

2 months

RT @BowenWangNLP: 😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study:. 1,….

0

5

0

Tao Yu

@taoyds

2 months

RT @lmarena_ai: Check out the latest release from Computer Agent Arena!.

0

12

0

Tao Yu

@taoyds

2 months

RT @XLangNLP: 🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenA….

0

23

0

Tao Yu

@taoyds

2 months

RT @Alibaba_Qwen: Introducing Qwen3! . We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 den….

0

2K

0

Tao Yu

@taoyds

2 months

RT @DL4Code: DL4C is going wild with @taoyds 's talk on multimodal code gen and @xingyaow_ 's talk on OpenHands agents. #ICLR #ICLR2025 ht….

0

5

0

Tao Yu

@taoyds

2 months

Computer use often involves long contexts, and users frequently tweak or follow up on requests. Though Claude 3.7/Operator aren’t perfect, this example shows their engaging and instruction-following abilities are growing (see the arena example):

XLANG NLP Lab

@XLangNLP

2 months

🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote, and explore their full potential with CUAs at Join the community and dive in!

1

5

15

Tao Yu

@taoyds

2 months

RT @XLangNLP: 🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote,….

0

4

0

Tao Yu

@taoyds

2 months

👉Try UI-TARS-1.5 and more other computer use agents (Operator, Claude 3.7) at .

XLANG NLP Lab

@XLangNLP

2 months

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! . Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to

0

1