Alexandre L.-Piché Profile
Alexandre L.-Piché

@alexpiche_

Followers
1K
Following
15K
Media
18
Statuses
126

Searching for Q* at @ServiceNowRSRCH

Montreal, Qc
Joined October 2011
Don't wanna be here? Send us removal request.
@alexpiche_
Alexandre L.-Piché
1 year
Introducing ReSearch: An iterative self-reflection algorithm that enhances LLM's self-restraint abilities:. • Encouraging abstention when uncertain.• Producing accurate, informative content when confident. Result: Significant accuracy boost for Llama2 7B Chat and Mistral 7B! 🚀
1
45
102
@alexpiche_
Alexandre L.-Piché
1 month
RT @GabrielHuang9: As #ICML2025 kicks off in Vancouver, our AI talent is being quietly pushed out. 🇨🇦. We've been waiting 28 months for per….
0
10
0
@alexpiche_
Alexandre L.-Piché
1 month
RT @MassCaccia: 🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿….
0
50
0
@alexpiche_
Alexandre L.-Piché
4 months
RT @DBahdanau: I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until yo….
0
115
0
@alexpiche_
Alexandre L.-Piché
10 months
RT @alex_lacoste_: @AnthropicAI Early results with Claude 3.5 sonnet for our new paper. We're probably not even using it right yet and its….
0
7
0
@alexpiche_
Alexandre L.-Piché
10 months
RT @DjDvij: I am also hiring for my new team at @ServiceNowRSRCH, please reach out if you are at the conference and interested in building….
0
6
0
@alexpiche_
Alexandre L.-Piché
10 months
RT @DjDvij: The dominant paradigm in AI alignment is to learn from human feedback. But what form should this feedback take? A simple thumbs….
0
12
0
@alexpiche_
Alexandre L.-Piché
10 months
RT @DBahdanau: 🚨 New agent framework! 🚨. My team at @ServiceNowRSRCH is releasing TapeAgents: a holistic framework for agent development a….
0
40
0
@alexpiche_
Alexandre L.-Piché
11 months
RT @alexandredrouin: Interested in time series forecasting and LLMs?. We are looking for visiting researchers to work on context-aided fore….
0
21
0
@alexpiche_
Alexandre L.-Piché
1 year
RT @rosieyzh: In our new work on evaluating optimizers for LLM training, we perform a series of experiments to investigate the role of adap….
0
31
0
@alexpiche_
Alexandre L.-Piché
1 year
RT @alexpiche_: We can tweak the target accuracy to obtain different behaviors. High target accuracy: ReSearch is very cautious and produce….
0
1
0
@alexpiche_
Alexandre L.-Piché
1 year
Best of all, you don’t have to pay for the expensive search procedure at test time! ReSearch can be distilled and results in similar (if not better!) accuracy without any computational overhead!
Tweet media one
1
0
3
@alexpiche_
Alexandre L.-Piché
1 year
We can tweak the target accuracy to obtain different behaviors. High target accuracy: ReSearch is very cautious and produces less claims on average. Low target accuracy: ReSearch is less cautious, produces more claims, and yet is *still* more accurate than default behavior.
Tweet media one
1
1
2
@alexpiche_
Alexandre L.-Piché
1 year
2) ReSearch adapts the level of detailedness (scored out of 10 by Llama2 70b chat) based on the model's level of knowledge - less details for less known entities and more details for popular ones. Llama2 7b provides detailed biographies for every tier - results in hallucinations.
Tweet media one
1
0
3
@alexpiche_
Alexandre L.-Piché
1 year
1) ReSearch can abstain when the model is uncertain about an entity. We can see that on an invented tier (invented names with no significant internet presence), abstention increases from 67% to 83%! Abstaining when unsure is one way that our model can achieve higher accuracy.
Tweet media one
1
0
2
@alexpiche_
Alexandre L.-Piché
1 year
Llama2 7b chat obtains 26% accuracy on the bottom popularity tier; when combined with ReSearch it obtains over 70% accuracy! We see improvement for all popularity tiers, without access to any external references. How is that even possible?
Tweet media one
1
0
3
@alexpiche_
Alexandre L.-Piché
1 year
We tested ReSearch by writing biographies of people with Wiki pages. We grouped people in tiers based on the length of their pages (shortest pages are at the bottom popularity tier, etc). We evaluated the factual accuracy and number of claims using Llama2 70B with access to Wiki.
1
0
1
@alexpiche_
Alexandre L.-Piché
1 year
How?. • For a given prompt, draw many potential responses.• Break responses into claims.• Self-assess the LLM’s confidence in each claim.• Add high-confidence claims back into the prompt to guide the LLM.• Repeat!. Choose the best response. If all are full of lies - abstain.
1
1
4