xhluca Profile Banner
Xing Han Lu Profile
Xing Han Lu

@xhluca

Followers
2K
Following
6K
Media
214
Statuses
2K

Vibe agents @Mila_Quebec @McGill_NLP

The Wired
Joined December 2017
Don't wanna be here? Send us removal request.
@xhluca
Xing Han Lu
26 days
"Build the web for agents, not agents for the web". This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).
Tweet media one
9
54
194
@xhluca
Xing Han Lu
8 hours
RT @MassCaccia: πŸŽ‰ Our paper β€œπ»π‘œπ‘€ π‘‘π‘œ π‘‡π‘Ÿπ‘Žπ‘–π‘› π‘Œπ‘œπ‘’π‘Ÿ 𝐿𝐿𝑀 π‘Šπ‘’π‘ 𝐴𝑔𝑒𝑛𝑑: 𝐴 π‘†π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘π‘Žπ‘™ π·π‘–π‘Žπ‘”π‘›π‘œπ‘ π‘–π‘ β€ got an 𝐨𝐫𝐚π₯ at next week’s π—œπ—–π— π—Ÿ π—ͺπ—Όπ—Ώπ—Έπ˜€π—΅π—Όπ—½ 𝗼𝗻 π—–π—Όπ—Ίπ—½π˜‚π˜π—²π—Ώβ€¦.
0
26
0
@xhluca
Xing Han Lu
10 hours
Website: Paper:
0
0
0
@xhluca
Xing Han Lu
10 hours
AgentRewardBench will be presented at @COLM_conf 2025 in Montreal! See you soon and ping me if you want to meet up!.
@xhluca
Xing Han Lu
3 months
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories . We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and
Tweet media one
2
7
31
@xhluca
Xing Han Lu
2 days
RT @yoavartzi: @COLM_conf decisions are out, and so are we . The strength of submissions this year amazed us! Many many hard decisions πŸ˜©β€¦.
0
8
0
@xhluca
Xing Han Lu
3 days
RT @kyutai_labs: Kyutai TTS and Unmute are now open source!.The text-to-speech is natural, customizable, and fast: it can serve 32 users wi….
0
172
0
@xhluca
Xing Han Lu
5 days
RT @BlackboxNLP: 🚨 Excited to announce two invited speakers at #BlackboxNLP 2025!. Join us to hear from two leading voices in interpretabil….
0
10
0
@xhluca
Xing Han Lu
8 days
RT @vernadankers: I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen and @PontiEdoardo for inspiring discussions duri….
0
8
0
@xhluca
Xing Han Lu
12 days
RT @ysu_nlp: πŸ”ŽAgentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️. Introducin….
0
46
0
@xhluca
Xing Han Lu
12 days
RT @xhluca: @webagentlab Would appreciate if the authors could avoid copying the title of our paper, which was release more than 2 months a….
0
1
0
@xhluca
Xing Han Lu
13 days
RT @benno_krojer: Started a new podcast with @tvergarabrowne !. Behind the Research of AI: .We look behind the scenes, beyond the polished….
0
13
0
@xhluca
Xing Han Lu
13 days
RT @cesare_spinoso: A blizzard is raging in Montreal when your friend says β€œWow, the weather is amazing!” Humans easily interpret irony, wh….
0
11
0
@xhluca
Xing Han Lu
16 days
RT @benno_krojer: The video is online now!. 3min speed science talk on "From a soup of raw pixels to abstract meaning". .
0
6
0
@xhluca
Xing Han Lu
16 days
RT @ReviewAcl: Dear ACL community, We are seeking emergency reviewers for the May cycle. Please indicate your availability (ASAP) if you ca….
0
16
0
@xhluca
Xing Han Lu
18 days
RT @XLangNLP: πŸ”₯New Computer Agent Arena Leaderboard Updates (2k+ user votes)!.πŸ€”Which VLMs act better as computer use agents (CUAs)?. 1, Cla….
0
23
0
@xhluca
Xing Han Lu
20 days
Very important benchmark about the safety of computer use agents. Validates our findings in SafeArena ( that agents can complete harmful tasks - now with reasoning models and on OS tasks. We need safer digital agents asap before more productization.
@maksym_andr
Maksym Andriushchenko
20 days
🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:.1. deliberate user misuse,.2. prompt injections,.3. model misbehavior.
Tweet media one
0
7
25
@xhluca
Xing Han Lu
20 days
RT @maksym_andr: 🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety b….
0
27
0
@xhluca
Xing Han Lu
21 days
RT @julien_c: every @Gradio space is now a MCP tool you can add to our MCP server in 1 click 🀯
Tweet media one
0
14
0
@xhluca
Xing Han Lu
22 days
RT @ryan_tzr: The biggest issue web agents face: AUTHENTICATION WALLS πŸ”. Twitter, Instagram, LinkedIn, news sites - everything requires log….
0
1
0
@xhluca
Xing Han Lu
23 days
RT @hanseok_oh: Life update: I am joining as visiting researcher at @Mila_Quebec πŸ‡¨πŸ‡¦. I returned to academia to deepen my understanding of h….
0
4
0
@xhluca
Xing Han Lu
25 days
0
0
1