xhluca Profile Banner
Xing Han Lu Profile
Xing Han Lu

@xhluca

Followers
2K
Following
6K
Media
220
Statuses
2K

Vibe agents @Mila_Quebec @McGill_NLP

The Wired
Joined December 2017
Don't wanna be here? Send us removal request.
@xhluca
Xing Han Lu
5 months
"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).
9
58
197
@TianbaoX
Tianbao Xie
3 days
Appreciate @EpochAIResearch 's thoughtful review of OSWorld 👇 A few clarifications from our side as maintainers: OSWorld tasks span the full difficulty spectrum — from simple GUI edits to long multi-app workflows. Abhyankar et al.’s “step count” uses merged custom atomic
@EpochAIResearch
Epoch AI
4 days
We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!
4
5
34
@bhatia_mehar
Mehar Bhatia
3 days
🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵
1
44
109
@delliott
Desmond Elliott
3 days
I will give a Tea Talk about Language Modelling from Pixels at @Mila_Quebec on Friday 7th November at 10:30 EST. It will also be live-streamed over Google Meet https://t.co/dAmkWeyNGh.
0
5
23
@hllo_wrld
Victor Zhong
4 days
I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!
8
80
354
@delliott
Desmond Elliott
7 days
Looking forward to talking about Efficient Test-Time Scaling for Small Vision-Language Models at the University of Waterloo in the Davis Center 3301 at 4:30pm today. This is joint work with @monurcan55 and @dim_p_papa https://t.co/Zx0YJ3HAtY
0
8
31
@xhluca
Xing Han Lu
8 days
Great resource! I've been thinking of doing something like that for a while, but glad to see a great team putting it together so nicely :)
@yueqi_song
Yueqi Song @ EMNLP2025
9 days
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
0
1
6
@DunjieLu1219
Dunjie Lu
14 days
📣Introducing VideoAgentTrek: a human-free, web-scale pipeline that turns screen-recorded tutorials into training data for computer-use agents, powered by specially trained VLMs. 🔗 [Website] https://t.co/rxTDwNxgtw 📄 [Paper] https://t.co/SVgjCGUWhF
5
35
150
@xhluca
Xing Han Lu
16 days
booking an event on google calendar went from 8min with chatgpt agent to 2min with atlas. huge improvements in the span of a few months
0
0
4
@NielsRogge
Niels Rogge
17 days
For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called "Language Modelling with Pixels". They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches
25
56
544
@tomaarsen
tomaarsen
18 days
The MTEB team has just released MTEB v2, an upgrade to their evaluation suite for embedding models! Their blogpost covers all changes, including easier evaluation, multimodal support, rerankers, new interfaces, documentation, dataset statistics, a migration guide, etc. 🧵
4
13
94
@RajeswarSai
Sai Rajeswar
23 days
I’m looking forward to co-supervising students in the upcoming academic year at Mila. There is much to explore in the space of action-conditioned video modeling and long-context multimodal reasoning. We are advancing & if this aligns with your interests, please apply 👇
@Mila_Quebec
Mila - Institut québécois d'IA
23 days
Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4
0
3
15
@Mila_Quebec
Mila - Institut québécois d'IA
23 days
Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4
3
64
119
@jxmnop
dr. jack morris
23 days
had so much fun podcasting with some other AI researchers in Montreal last week 😎 this is how McGill looks in the fall btw
@benno_krojer
Benno Krojer
23 days
@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
9
5
156
@benno_krojer
Benno Krojer
23 days
@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
1
12
33
@ILaradji
Issam Laradji
25 days
🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,
0
19
41
@xhluca
Xing Han Lu
24 days
Original thread:
@ILaradji
Issam Laradji
25 days
🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,
0
1
0
@xhluca
Xing Han Lu
24 days
"DRBench: A Realistic Benchmark for Enterprise Deep Research" The first benchmark testing Deep Research agents on real enterprise tasks across domains like sales, cybersecurity and compliance.
4
15
69
@mhrnz_m
Mehrnaz Mofakhami
29 days
📃 New Paper Alert! ✨ A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens🚩 What do you think are some major limitations in current safety training approaches? ➡️ We think it's in their design: they rely on completely changing the model's distribution by
1
25
50
@ShikharMurty
Shikhar
28 days
Can we please do license cards for computer agents so they can pass bot detection? Like if my computer agent has gone through safety training, issue it a license and let it bypass captcha.
2
1
8