
Bowen Wang
@BowenWangNLP
Followers
437
Following
457
Media
8
Statuses
104
1st year Ph.D. student @XLangNLP @HKUniversity focusing on #NLP. Prev. @Tsinghua_Uni, passionate about computer-use agents.
Hong Kong
Joined July 2023
RT @Kimi_Moonshot: Meet Kimi-Researcher - an autonomous agent that excels at multi-turn search and reasoning. Powered by k 1.5 and trained….
0
222
0
RT @XLangNLP: 🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.🤔Which VLMs act better as computer use agents (CUAs)?. 1, Cla….
0
23
0
RT @OpenAI: Operator 🤝 OpenAI o3. Operator in ChatGPT has been updated with our latest reasoning model.
0
597
0
Based on my own testing, Claude 4 is even stronger in CUA than Claude 3.7 Sonnet, with agentic capabilities enhanced, come on and give it a try!.
💠Claude Opus 4 & Claude Sonnet 4.Welcome to the Computer Agent Arena🔥.Congratulations on the @AnthropicAI team for the great release!
1
0
7
RT @taoyds: 🤔Static CUA benchmarks enable fast model dev but lack task variety and risk overfitting. Computer Agent Arena tests crowdsour….
0
11
0
😀Our initial leaderboard finally came out, here I'd like to share a few interesting findings based on our case study:. 1, Claude 3.7 Sonnet consistently performs best across diverse task types, particularly excelling at open-ended queries like “write a paper reading report.”.2,.
🏆 Leaderboard Update!.🚀 Claude 3.7 Sonnet from @AnthropicAI ties #1 in Computer Agent Arena, followed by Operator from @OpenAI & UI-TARS-1.5 from @BytedanceTalk, which is significantly different from prior benchmarks!. Check the full rankings! 👉
1
5
14
RT @trycua: Part 2 of Build Your Own Operator on macOS is now live! The new cua-agent framework cuts down complexity and accelerates CUA de….
0
18
0
Big congrats to @TsingYoga and their team for pushing the boundaries of CUAs! . When developing, UI-TARS-1.5 truly feels like the beginning of a new chapter — the next episode is coming. Stay tuned for the leaderboard🚀!.
🎉 UI-TARS-1.5 is now live on Computer Agent Arena! . Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to
1
1
7
For folks working on CUAs, definitely give o3 and o4-mini a try from @OpenAI. Key takeaway: Enhancing image reasoning and tool-use abilities on FM could significantly boost CUA performances.
🚀 Exciting news! @OpenAI's o3 & o4-mini, the most capable reasoning models, are now live on Computer Agent Arena!.Test, vote, and explore their full potential with CUAs at Join the community and dive in!
0
0
1
RT @Kimi_Moonshot: 🚀 Meet Kimi-VL and Kimi-VL-Thinking! 🌟 Our latest open source lightweight yet powerful Vision-Language Model with reason….
0
213
0
RT @hllo_wrld: I want to highlight that this was an incredibly complex piece of work put together by @BowenWangNLP. We have been working on….
0
6
0
@xywang626 @jiaqideng07 @TianbaoX @RyanLi0802 @StevenyzZhang @nikushii_ @istoica05 @infwinston @Diyi_Yang @ysu_nlp @hllo_wrld @taoyds @gneubig @dan_fried 📊 Curious how your favorite computer use agent stacks up?. Dive into the leaderboard, explore model performance, and share your feedback to help shape the future of computer-use agents!. Data & Code would be open-sourced in a few weeks!. 👉 Platform: 🏆.
0
0
10
👋Acknowledgement. Thanks to the Computer Agent Arena team: @xywang626, @jiaqideng07, @TianbaoX, @RyanLi0802, Gavin Li, @StevenyzZhang, @nikushii_, @istoica05, @infwinston, @Diyi_Yang, @ysu_nlp, Yi Zhang, Zhiguo Wang, @hllo_wrld, @taoyds. Also thanks to @gneubig, @dan_fried,.
1
1
16
Use Case #2. Personal Use Task: Can you help me export my homepage in Notion to a html file onto my desktop and open it in the browser to preview it?.Battle: Gemini 2.0 Flash vs. Claude 3.5 Sonnet (New) - Computer-Use.Details at: [6/🧵]
1
0
13
Use Case #1. Web browser task: please help me find the cheapest man's long-sleeve t-shirt on Amazon, I need a new fit for summer.Battle: Gemini 2.5 Pro (Experimental) vs. OpenAI Computer-Use Preview.Details at: [5/🧵]
1
1
12