Alexandre L.-Piché @alexpiche_ X Profile

Alexandre L.-Piché

@alexpiche_

Followers

1K

Following

15K

Media

21

Statuses

143

Making RL go fast at @ServiceNowRSRCH

https://t.co/pTFAuye8Xi

Montreal, Qc

Joined October 2011

Don't wanna be here? Send us removal request.

Alexandre L.-Piché

@alexpiche_

1 month

Very excited to see vLLM supports Pipeline RL’s in-flight weight updates! It allowed our team to quickly and reliably train Qwen base 7B to reason from scratch! Want to hear more? Join us at our Pipeline RL expo talk at CoLM this Thursday 1PM room 524C.

vLLM

@vllm_project

1 month

🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what

1

10

26

Hamish Ivison

@hamishivi

10 days

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance

Rishabh Agarwal

@agarwl_

11 days

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to

6

35

229

Rishabh Agarwal

@agarwl_

11 days

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to

Alexandre L.-Piché

@alexpiche_

12 days

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:

12

62

479

Lewis Tunstall

@_lewtun

12 days

In the Smol Training Playbook, I tried to survey the state of popular post-training frameworks. Let me know if I missed any and I'll add them to the list!

20

15

194

Alexandre L.-Piché

@alexpiche_

12 days

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:

1

29

140

Michael Goin

@mgoin_

1 month

@natolambert Shout out to PipelineRL for open sourcing this in April based on vLLM https://t.co/e2oAK3eKof

0

6

31

Alexandre L.-Piché

@alexpiche_

1 month

Room change again. We are now in 522!

Alexandre L.-Piché

@alexpiche_

1 month

If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.

0

6

Alexandre L.-Piché

@alexpiche_

1 month

If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.

🇺🇦 Dzmitry Bahdanau

@DBahdanau

1 month

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!

0

1

8

Alexandre L.-Piché

@alexpiche_

1 month

If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.

🇺🇦 Dzmitry Bahdanau

@DBahdanau

1 month

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!

0

1

8

Alexandre L.-Piché

@alexpiche_

1 month

Very excited to be presenting Pipeline RL this afternoon at CoLM. Join us if you are interested in fast on policy RL training for LLMs 🚀

🇺🇦 Dzmitry Bahdanau

@DBahdanau

1 month

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!

0

8

21

🇺🇦 Dzmitry Bahdanau

@DBahdanau

1 month

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!

2

9

61

Torsten Scholak

@tscholak

1 month

🧠 Call for Interns – ServiceNow AI Research (Montreal) Our Foundation Models Lab is recruiting interns for 2026! We train & optimize LLMs, from diffusion-based generation to state-space hybrids. If you care about efficient LLMs, diffusion or reasoning → this is for you. 🧵👇

5

22

142

Massimo Caccia

@MassCaccia

1 month

Come join us in beautiful Montreal! 🇨🇦✨ Included: 🏓 Ping pong 🧠 Top-tier publications (and deadlines 👿) 💻 lots of Compute 🍪 okay Snacks 💸 Stipend

Alexandre Lacoste

@alex_lacoste_

1 month

🚨 Call for Interns – ServiceNow AI Research (Montreal) Our Computer-Use Agents team (Frontier AI Research) is recruiting interns for 2026! We work on LLMs and VLMs that can reliably use software and publishing at top venues (NeurIPS, ICML, ICLR) and developing open-source

0

2

28

vLLM

@vllm_project

1 month

🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what

🇺🇦 Dzmitry Bahdanau

@DBahdanau

7 months

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: https://t.co/AgEyxXb7Xi Blog: https://t.co/n4FRxiEcrr

8

66

471

Sai Rajeswar

@RajeswarSai

2 months

💡So far, I have been sharing our multimodal AI research at @ServiceNow focused on reasoning over pixels. Today, we share a new chapter with an open-source release of our big initiative in the voice and speech domain.🚀 🎧 AU-Harness: Holistic Evaluation of Audio LLM Responses

1

6

19

Alexandre L.-Piché

@alexpiche_

2 months

Glad to see OpenAI prioritizing abstention responses in their paper! That's a great intro to our TMLR paper in which we developed an iterative self-reflection method for LLM to know when to abstain without ground truth and no additional cost at test time. https://t.co/xwNT68ejqm

Adam Tauman Kalai

@adamfungi

2 months

New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵 https://t.co/6Lb6xlg0SZ

1

10

18

Gabriel Huang

@GabrielHuang9

4 months

As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. 🇨🇦 We've been waiting 28 months for permanent residency, but @CitImmCanada won’t budge. Please read and share our story https://t.co/NkiH483OIh https://t.co/kM2BpfxUyh #IRCC #AI #Immigration #AI

3

9

26

Massimo Caccia

@MassCaccia

4 months

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞

6

53

218

🇺🇦 Dzmitry Bahdanau

@DBahdanau

7 months

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: https://t.co/AgEyxXb7Xi Blog: https://t.co/n4FRxiEcrr

9

134

682

Alexandre Lacoste

@alex_lacoste_

1 year

@AnthropicAI Early results with Claude 3.5 sonnet for our new paper. We're probably not even using it right yet and its performance is through the roof, leaving o1-mini in the dust (o1-preview results are coming). See https://t.co/itubXoneh3 for a growing amount of web-ui benchmarks.

0

7

19