Bowen Wang @BowenWangNLP X Profile

Bowen Wang

@BowenWangNLP

Followers

437

Following

457

Media

8

Statuses

104

1st year Ph.D. student @XLangNLP @HKUniversity focusing on #NLP. Prev. @Tsinghua_Uni, passionate about computer-use agents.

Hong Kong

Joined July 2023

Don't wanna be here? Send us removal request.

Bowen Wang

@BowenWangNLP

3 months

🎮 Computer Use Agent Arena is LIVE! 🚀.🔥 Easiest way to test computer-use agents in the wild without any setup.🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more.🕹️ Test agents on 100+ real apps & webs with one-click config.🔒 Safe & free

14

104

333

Bowen Wang

@BowenWangNLP

13 days

RT @Kimi_Moonshot: Meet Kimi-Researcher - an autonomous agent that excels at multi-turn search and reasoning. Powered by k 1.5 and trained….

0

222

0

Bowen Wang

@BowenWangNLP

17 days

RT @XLangNLP: 🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.🤔Which VLMs act better as computer use agents (CUAs)?. 1, Cla….

0

23

0

Bowen Wang

@BowenWangNLP

1 month

RT @OpenAI: Operator 🤝 OpenAI o3. Operator in ChatGPT has been updated with our latest reasoning model.

0

597

0

Bowen Wang

@BowenWangNLP

1 month

Based on my own testing, Claude 4 is even stronger in CUA than Claude 3.7 Sonnet, with agentic capabilities enhanced, come on and give it a try!.

XLANG NLP Lab

@XLangNLP

1 month

💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great release!

1

0

7

Bowen Wang

@BowenWangNLP

2 months

RT @taoyds: 🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsour….

0

11

0

Bowen Wang

@BowenWangNLP

2 months

😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study:. 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.”.2,.

XLANG NLP Lab

@XLangNLP

2 months

🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks!. Check the full rankings! 👉

1

5

14

Bowen Wang

@BowenWangNLP

2 months

RT @trycua: Part 2 of Build Your Own Operator on macOS is now live! The new cua-agent framework cuts down complexity and accelerates CUA de….

0

18

0

Bowen Wang

@BowenWangNLP

2 months

Big congrats to @TsingYoga and their team for pushing the boundaries of CUAs! . When developing, UI-TARS-1.5 truly feels like the beginning of a new chapter — the next episode is coming. Stay tuned for the leaderboard🚀!.

XLANG NLP Lab

@XLangNLP

2 months

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! . Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to

1

7

Bowen Wang

@BowenWangNLP

2 months

For folks working on CUAs, definitely give o3 and o4-mini a try from @OpenAI. Key takeaway: Enhancing image reasoning and tool-use abilities on FM could significantly boost CUA performances.

XLANG NLP Lab

@XLangNLP

2 months

🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote, and explore their full potential with CUAs at Join the community and dive in!

0

1

Bowen Wang

@BowenWangNLP

3 months

RT @TsingYoga: UI-TARS-1-5

0

14

0

Bowen Wang

@BowenWangNLP

3 months

RT @Kimi_Moonshot: 🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reason….

0

213

0

Bowen Wang

@BowenWangNLP

3 months

RT @hllo_wrld: I want to highlight that this was an incredibly complex piece of work put together by @BowenWangNLP. We have been working on….

0

6

0

Bowen Wang

@BowenWangNLP

3 months

@xywang626 @jiaqideng07 @TianbaoX @RyanLi0802 @StevenyzZhang @nikushii_ @istoica05 @infwinston @Diyi_Yang @ysu_nlp @hllo_wrld @taoyds @gneubig @dan_fried 📊 Curious how your favorite computer use agent stacks up?. Dive into the leaderboard, explore model performance, and share your feedback to help shape the future of computer-use agents!. Data & Code would be open-sourced in a few weeks!. 👉 Platform: 🏆.

0

10

Bowen Wang

@BowenWangNLP

3 months

👋Acknowledgement. Thanks to the Computer Agent Arena team: @xywang626, @jiaqideng07, @TianbaoX, @RyanLi0802, Gavin Li, @StevenyzZhang, @nikushii_, @istoica05, @infwinston, @Diyi_Yang, @ysu_nlp, Yi Zhang, Zhiguo Wang, @hllo_wrld, @taoyds. Also thanks to @gneubig, @dan_fried,.

1

16

Bowen Wang

@BowenWangNLP

3 months

Use Case #2. Personal Use Task: Can you help me export my homepage in Notion to a html file onto my desktop and open it in the browser to preview it?.Battle: Gemini 2.0 Flash vs. Claude 3.5 Sonnet (New) - Computer-Use.Details at: [6/🧵]

1

0

13

Bowen Wang

@BowenWangNLP

3 months

Use Case #1. Web browser task: please help me find the cheapest man's long-sleeve t-shirt on Amazon, I need a new fit for summer.Battle: Gemini 2.5 Pro (Experimental) vs. OpenAI Computer-Use Preview.Details at: [5/🧵]

1

12

Bowen Wang

@BowenWangNLP

3 months

🏆 Leaderboard Highlights (tentative). 🥇 OpenAI Operator.🥈 Claude 3.7 Sonnet.🥉 Claude 3.5 Sonnet.Stay tuned for the official leaderboard in the following weeks. 📊 Check the latest, real-time leaderboard (tentative): [4/🧵]

1

0

18

Bowen Wang

@BowenWangNLP

3 months

🛠️ Set up your environment with ease!. 🔐 Free & safe access to agents on cloud-hosted machines, fully isolated.🌟 Pre-installed apps & software: LibreOffice, Slack, VSCode, GIMP, PDF editors….🌐 Web apps/domains: YouTube, Reddit….🔧 Customize further: Upload files, Open URLs,

1

0

13

Bowen Wang

@BowenWangNLP

3 months

🖥️ How does it work?. 1️⃣ Choose your OS (currently Windows, Ubuntu supported, MacOS coming soon).2️⃣ Setup the initial desktop environment with one-click configuration.3️⃣ Write your task (e.g. "Upload a CV to Slack").4️⃣ Observe agents execute step-by-step.5️⃣ Evaluate: Which agent

1

0

24