Xing Han Lu @xhluca X Profile

Xing Han Lu

@xhluca

Followers

2K

Following

6K

Media

220

Statuses

2K

Vibe agents @Mila_Quebec @McGill_NLP

https://t.co/yY1uwTXNu4

The Wired

Joined December 2017

Don't wanna be here? Send us removal request.

Xing Han Lu

@xhluca

5 months

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

9

58

197

Tianbao Xie

@TianbaoX

3 days

Appreciate @EpochAIResearch 's thoughtful review of OSWorld 👇 A few clarifications from our side as maintainers: OSWorld tasks span the full difficulty spectrum — from simple GUI edits to long multi-app workflows. Abhyankar et al.’s “step count” uses merged custom atomic

Epoch AI

@EpochAIResearch

4 days

We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!

4

5

34

Mehar Bhatia

@bhatia_mehar

3 days

🚨How do LLMs acquire human values?🤔 We often point to preference optimization. However, in our new work, we trace how and when model values shift during post-training and uncover surprising dynamics. We ask: How do data, algorithms, and their interaction shape model values?🧵

1

44

109

Desmond Elliott

@delliott

3 days

I will give a Tea Talk about Language Modelling from Pixels at @Mila_Quebec on Friday 7th November at 10:30 EST. It will also be live-streamed over Google Meet https://t.co/dAmkWeyNGh.

0

5

23

Victor Zhong

@hllo_wrld

4 days

I am hiring for fully funded (up to 3 years) postdoc positions in AI for science at Waterloo/Vector: multimodal deep research, agents, tool-use. You'll work closely w/ industry partners & lead projects. Please share! Apply at https://t.co/ShYRDnnjIu or email me directly!

8

80

354

Desmond Elliott

@delliott

7 days

Looking forward to talking about Efficient Test-Time Scaling for Small Vision-Language Models at the University of Waterloo in the Davis Center 3301 at 4:30pm today. This is joint work with @monurcan55 and @dim_p_papa https://t.co/Zx0YJ3HAtY

0

8

31

Xing Han Lu

@xhluca

8 days

Great resource! I've been thinking of doing something like that for a while, but glad to see a great team putting it together so nicely :)

Yueqi Song @ EMNLP2025

@yueqi_song

9 days

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.

0

1

6

Dunjie Lu

@DunjieLu1219

14 days

📣Introducing VideoAgentTrek: a human-free, web-scale pipeline that turns screen-recorded tutorials into training data for computer-use agents, powered by specially trained VLMs. 🔗 [Website] https://t.co/rxTDwNxgtw 📄 [Paper] https://t.co/SVgjCGUWhF

5

35

150

Xing Han Lu

@xhluca

16 days

booking an event on google calendar went from 8min with chatgpt agent to 2min with atlas. huge improvements in the span of a few months

0

4

Niels Rogge

@NielsRogge

17 days

For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called "Language Modelling with Pixels". They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches

25

56

544

tomaarsen

@tomaarsen

18 days

The MTEB team has just released MTEB v2, an upgrade to their evaluation suite for embedding models! Their blogpost covers all changes, including easier evaluation, multimodal support, rerankers, new interfaces, documentation, dataset statistics, a migration guide, etc. 🧵

4

13

94

Sai Rajeswar

@RajeswarSai

23 days

I’m looking forward to co-supervising students in the upcoming academic year at Mila. There is much to explore in the space of action-conditioned video modeling and long-context multimodal reasoning. We are advancing & if this aligns with your interests, please apply 👇

Mila - Institut québécois d'IA

@Mila_Quebec

23 days

Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4

0

3

15

Mila - Institut québécois d'IA

@Mila_Quebec

23 days

Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit https://t.co/r01eLcY1P4

3

64

119

dr. jack morris

@jxmnop

23 days

had so much fun podcasting with some other AI researchers in Montreal last week 😎 this is how McGill looks in the fall btw

Benno Krojer

@benno_krojer

23 days

@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas

9

5

156

Benno Krojer

@benno_krojer

23 days

@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas

1

12

33

Issam Laradji

@ILaradji

25 days

🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,

0

19

41

Xing Han Lu

@xhluca

24 days

Original thread:

Issam Laradji

@ILaradji

25 days

🚀 Releasing DRBench, an Enterprise-Grade Deep Research Benchmark Paper! 📄 Paper: https://t.co/meWM3Qj77w 💻 Code: https://t.co/RfhQ1mWayc We’re excited to introduce DRBench, the first benchmark designed to evaluate deep research agents on open-ended enterprise research tasks,

0

1

0

Xing Han Lu

@xhluca

24 days

Code: https://t.co/xDmnhBon9u Paper:

arxiv.org

We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. Unlike prior benchmarks that focus on simple questions or web-only...

1

0

Xing Han Lu

@xhluca

24 days

"DRBench: A Realistic Benchmark for Enterprise Deep Research" The first benchmark testing Deep Research agents on real enterprise tasks across domains like sales, cybersecurity and compliance.

4

15

69

Mehrnaz Mofakhami

@mhrnz_m

29 days

📃 New Paper Alert! ✨ A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens🚩 What do you think are some major limitations in current safety training approaches? ➡️ We think it's in their design: they rely on completely changing the model's distribution by

1

25

50

Shikhar

@ShikharMurty

28 days

Can we please do license cards for computer agents so they can pass bot detection? Like if my computer agent has gone through safety training, issue it a license and let it bypass captcha.

2

1

8