Xing Han Lu
@xhluca
Followers
2K
Following
6K
Media
220
Statuses
2K
Vibe agents @Mila_Quebec @McGill_NLP
The Wired
Joined December 2017
"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).
9
58
197
Appreciate @EpochAIResearch 's thoughtful review of OSWorld 👇 A few clarifications from our side as maintainers: OSWorld tasks span the full difficulty spectrum — from simple GUI edits to long multi-app workflows. Abhyankar et al.’s “step count” uses merged custom atomic
We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!
4
5
34
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
1
44
109
I will give a Tea Talk about Language Modelling from Pixels at @Mila_Quebec on Friday 7th November at 10:30 EST. It will also be live-streamed over Google Meet https://t.co/dAmkWeyNGh.
0
5
23
I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!
8
80
354
Looking forward to talking about Efficient Test-Time Scaling for Small Vision-Language Models at the University of Waterloo in the Davis Center 3301 at 4:30pm today. This is joint work with @monurcan55 and @dim_p_papa
https://t.co/Zx0YJ3HAtY
0
8
31
Great resource! I've been thinking of doing something like that for a while, but glad to see a great team putting it together so nicely :)
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
0
1
6
📣Introducing VideoAgentTrek: a human-free, web-scale pipeline that turns screen-recorded tutorials into training data for computer-use agents, powered by specially trained VLMs. 🔗 [Website] https://t.co/rxTDwNxgtw 📄 [Paper] https://t.co/SVgjCGUWhF
5
35
150
booking an event on google calendar went from 8min with chatgpt agent to 2min with atlas. huge improvements in the span of a few months
0
0
4
For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called "Language Modelling with Pixels". They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches
25
56
544
The MTEB team has just released MTEB v2, an upgrade to their evaluation suite for embedding models! Their blogpost covers all changes, including easier evaluation, multimodal support, rerankers, new interfaces, documentation, dataset statistics, a migration guide, etc. 🧵
4
13
94
I’m looking forward to co-supervising students in the upcoming academic year at Mila. There is much to explore in the space of action-conditioned video modeling and long-context multimodal reasoning. We are advancing & if this aligns with your interests, please apply 👇
Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4
0
3
15
Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4
3
64
119
had so much fun podcasting with some other AI researchers in Montreal last week 😎 this is how McGill looks in the fall btw
@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
9
5
156
@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
1
12
33
🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,
0
19
41
Original thread:
🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,
0
1
0
"DRBench: A Realistic Benchmark for Enterprise Deep Research" The first benchmark testing Deep Research agents on real enterprise tasks across domains like sales, cybersecurity and compliance.
4
15
69
📃 New Paper Alert! ✨ A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens🚩 What do you think are some major limitations in current safety training approaches? ➡️ We think it's in their design: they rely on completely changing the model's distribution by
1
25
50
Can we please do license cards for computer agents so they can pass bot detection? Like if my computer agent has gone through safety training, issue it a license and let it bypass captcha.
2
1
8