Alexandre L.-Piché Profile
Alexandre L.-Piché

@alexpiche_

Followers
1K
Following
15K
Media
21
Statuses
143

Making RL go fast at @ServiceNowRSRCH

Montreal, Qc
Joined October 2011
Don't wanna be here? Send us removal request.
@alexpiche_
Alexandre L.-Piché
1 month
Very excited to see vLLM supports Pipeline RL’s in-flight weight updates! It allowed our team to quickly and reliably train Qwen base 7B to reason from scratch! Want to hear more? Join us at our Pipeline RL expo talk at CoLM this Thursday 1PM room 524C.
@vllm_project
vLLM
1 month
🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what
1
10
26
@hamishivi
Hamish Ivison
10 days
to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance
@agarwl_
Rishabh Agarwal
11 days
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
6
35
229
@agarwl_
Rishabh Agarwal
11 days
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
@alexpiche_
Alexandre L.-Piché
12 days
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
12
62
479
@_lewtun
Lewis Tunstall
12 days
In the Smol Training Playbook, I tried to survey the state of popular post-training frameworks. Let me know if I missed any and I'll add them to the list!
20
15
194
@alexpiche_
Alexandre L.-Piché
12 days
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
1
29
140
@mgoin_
Michael Goin
1 month
@natolambert Shout out to PipelineRL for open sourcing this in April based on vLLM https://t.co/e2oAK3eKof
0
6
31
@alexpiche_
Alexandre L.-Piché
1 month
Room change again. We are now in 522!
@alexpiche_
Alexandre L.-Piché
1 month
If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.
0
0
6
@alexpiche_
Alexandre L.-Piché
1 month
If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.
@DBahdanau
🇺🇦 Dzmitry Bahdanau
1 month
We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!
0
1
8
@alexpiche_
Alexandre L.-Piché
1 month
If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.
@DBahdanau
🇺🇦 Dzmitry Bahdanau
1 month
We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!
0
1
8
@alexpiche_
Alexandre L.-Piché
1 month
Very excited to be presenting Pipeline RL this afternoon at CoLM. Join us if you are interested in fast on policy RL training for LLMs 🚀
@DBahdanau
🇺🇦 Dzmitry Bahdanau
1 month
We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!
0
8
21
@DBahdanau
🇺🇦 Dzmitry Bahdanau
1 month
We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!
2
9
61
@tscholak
Torsten Scholak
1 month
🧠 Call for Interns – ServiceNow AI Research (Montreal) Our Foundation Models Lab is recruiting interns for 2026! We train & optimize LLMs, from diffusion-based generation to state-space hybrids. If you care about efficient LLMs, diffusion or reasoning → this is for you. 🧵👇
5
22
142
@MassCaccia
Massimo Caccia
1 month
Come join us in beautiful Montreal! 🇨🇦✨ Included: 🏓 Ping pong 🧠 Top-tier publications (and deadlines 👿) 💻 lots of Compute 🍪 okay Snacks 💸 Stipend
@alex_lacoste_
Alexandre Lacoste
1 month
🚨 Call for Interns – ServiceNow AI Research (Montreal) Our Computer-Use Agents team (Frontier AI Research) is recruiting interns for 2026! We work on LLMs and VLMs that can reliably use software and publishing at top venues (NeurIPS, ICML, ICLR) and developing open-source
0
2
28
@vllm_project
vLLM
1 month
🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what
@DBahdanau
🇺🇦 Dzmitry Bahdanau
7 months
I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: https://t.co/AgEyxXb7Xi Blog: https://t.co/n4FRxiEcrr
8
66
471
@RajeswarSai
Sai Rajeswar
2 months
💡So far, I have been sharing our multimodal AI research at @ServiceNow focused on reasoning over pixels. Today, we share a new chapter with an open-source release of our big initiative in the voice and speech domain.🚀 🎧 AU-Harness: Holistic Evaluation of Audio LLM Responses
1
6
19
@alexpiche_
Alexandre L.-Piché
2 months
Glad to see OpenAI prioritizing abstention responses in their paper! That's a great intro to our TMLR paper in which we developed an iterative self-reflection method for LLM to know when to abstain without ground truth and no additional cost at test time. https://t.co/xwNT68ejqm
@adamfungi
Adam Tauman Kalai
2 months
New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵 https://t.co/6Lb6xlg0SZ
1
10
18
@GabrielHuang9
Gabriel Huang
4 months
As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. 🇨🇦 We've been waiting 28 months for permanent residency, but @CitImmCanada won’t budge. Please read and share our story https://t.co/NkiH483OIh https://t.co/kM2BpfxUyh #IRCC #AI #Immigration #AI
3
9
26
@MassCaccia
Massimo Caccia
4 months
🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞
6
53
218
@DBahdanau
🇺🇦 Dzmitry Bahdanau
7 months
I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: https://t.co/AgEyxXb7Xi Blog: https://t.co/n4FRxiEcrr
9
134
682
@alex_lacoste_
Alexandre Lacoste
1 year
@AnthropicAI Early results with Claude 3.5 sonnet for our new paper. We're probably not even using it right yet and its performance is through the roof, leaving o1-mini in the dust (o1-preview results are coming). See https://t.co/itubXoneh3 for a growing amount of web-ui benchmarks.
0
7
19