BowenWangNLP Profile Banner
Bowen Wang Profile
Bowen Wang

@BowenWangNLP

Followers
437
Following
457
Media
8
Statuses
104

1st year Ph.D. student @XLangNLP @HKUniversity focusing on #NLP. Prev. @Tsinghua_Uni, passionate about computer-use agents.

Hong Kong
Joined July 2023
Don't wanna be here? Send us removal request.
@BowenWangNLP
Bowen Wang
3 months
🎮 Computer Use Agent Arena is LIVE! 🚀.🔥 Easiest way to test computer-use agents in the wild without any setup.🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more.🕹️ Test agents on 100+ real apps & webs with one-click config.🔒 Safe & free
14
104
333
@BowenWangNLP
Bowen Wang
13 days
RT @Kimi_Moonshot: Meet Kimi-Researcher - an autonomous agent that excels at multi-turn search and reasoning. Powered by k 1.5 and trained….
0
222
0
@BowenWangNLP
Bowen Wang
17 days
RT @XLangNLP: 🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.🤔Which VLMs act better as computer use agents (CUAs)?. 1, Cla….
0
23
0
@BowenWangNLP
Bowen Wang
1 month
RT @OpenAI: Operator 🤝 OpenAI o3. Operator in ChatGPT has been updated with our latest reasoning model.
0
597
0
@BowenWangNLP
Bowen Wang
1 month
Based on my own testing, Claude 4 is even stronger in CUA than Claude 3.7 Sonnet, with agentic capabilities enhanced, come on and give it a try!.
@XLangNLP
XLANG NLP Lab
1 month
💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great release!
Tweet media one
1
0
7
@BowenWangNLP
Bowen Wang
2 months
RT @taoyds: 🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsour….
0
11
0
@BowenWangNLP
Bowen Wang
2 months
😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study:. 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.”.2,.
@XLangNLP
XLANG NLP Lab
2 months
🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks!. Check the full rankings! 👉
Tweet media one
1
5
14
@BowenWangNLP
Bowen Wang
2 months
RT @trycua: Part 2 of Build Your Own Operator on macOS is now live! The new cua-agent framework cuts down complexity and accelerates CUA de….
0
18
0
@BowenWangNLP
Bowen Wang
2 months
Big congrats to @TsingYoga and their team for pushing the boundaries of CUAs! . When developing, UI-TARS-1.5 truly feels like the beginning of a new chapter — the next episode is coming. Stay tuned for the leaderboard🚀!.
@XLangNLP
XLANG NLP Lab
2 months
🎉 UI-TARS-1.5 is now live on Computer Agent Arena! . Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to
Tweet media one
1
1
7
@BowenWangNLP
Bowen Wang
2 months
For folks working on CUAs, definitely give o3 and o4-mini a try from @OpenAI. Key takeaway: Enhancing image reasoning and tool-use abilities on FM could significantly boost CUA performances.
@XLangNLP
XLANG NLP Lab
2 months
🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote, and explore their full potential with CUAs at Join the community and dive in!
Tweet media one
0
0
1
@BowenWangNLP
Bowen Wang
3 months
RT @TsingYoga: UI-TARS-1-5
0
14
0
@BowenWangNLP
Bowen Wang
3 months
RT @Kimi_Moonshot: 🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reason….
0
213
0
@BowenWangNLP
Bowen Wang
3 months
RT @hllo_wrld: I want to highlight that this was an incredibly complex piece of work put together by @BowenWangNLP. We have been working on….
0
6
0
@BowenWangNLP
Bowen Wang
3 months
@xywang626 @jiaqideng07 @TianbaoX @RyanLi0802 @StevenyzZhang @nikushii_ @istoica05 @infwinston @Diyi_Yang @ysu_nlp @hllo_wrld @taoyds @gneubig @dan_fried 📊 Curious how your favorite computer use agent stacks up?. Dive into the leaderboard, explore model performance, and share your feedback to help shape the future of computer-use agents!. Data & Code would be open-sourced in a few weeks!. 👉 Platform: 🏆.
0
0
10
@BowenWangNLP
Bowen Wang
3 months
👋Acknowledgement. Thanks to the Computer Agent Arena team: @xywang626, @jiaqideng07, @TianbaoX, @RyanLi0802, Gavin Li, @StevenyzZhang, @nikushii_, @istoica05, @infwinston, @Diyi_Yang, @ysu_nlp, Yi Zhang, Zhiguo Wang, @hllo_wrld, @taoyds. Also thanks to @gneubig, @dan_fried,.
1
1
16
@BowenWangNLP
Bowen Wang
3 months
Use Case #2. Personal Use Task: Can you help me export my homepage in Notion to a html file onto my desktop and open it in the browser to preview it?.Battle: Gemini 2.0 Flash vs. Claude 3.5 Sonnet (New) - Computer-Use.Details at: [6/🧵]
1
0
13
@BowenWangNLP
Bowen Wang
3 months
Use Case #1. Web browser task: please help me find the cheapest man's long-sleeve t-shirt on Amazon, I need a new fit for summer.Battle: Gemini 2.5 Pro (Experimental) vs. OpenAI Computer-Use Preview.Details at: [5/🧵]
1
1
12
@BowenWangNLP
Bowen Wang
3 months
🏆 Leaderboard Highlights (tentative). 🥇 OpenAI Operator.🥈 Claude 3.7 Sonnet.🥉 Claude 3.5 Sonnet.Stay tuned for the official leaderboard in the following weeks. 📊 Check the latest, real-time leaderboard (tentative): [4/🧵]
Tweet media one
1
0
18
@BowenWangNLP
Bowen Wang
3 months
🛠️ Set up your environment with ease!. 🔐 Free & safe access to agents on cloud-hosted machines, fully isolated.🌟 Pre-installed apps & software: LibreOffice, Slack, VSCode, GIMP, PDF editors….🌐 Web apps/domains: YouTube, Reddit….🔧 Customize further: Upload files, Open URLs,
Tweet media one
1
0
13
@BowenWangNLP
Bowen Wang
3 months
🖥️ How does it work?. 1️⃣ Choose your OS (currently Windows, Ubuntu supported, MacOS coming soon).2️⃣ Setup the initial desktop environment with one-click configuration.3️⃣ Write your task (e.g. "Upload a CV to Slack").4️⃣ Observe agents execute step-by-step.5️⃣ Evaluate: Which agent
1
0
24