Charles Arnal Profile
Charles Arnal

@arnal_charles

Followers
91
Following
71
Media
11
Statuses
24

Postdoc at @MetaAI, mathematician ENS, Cambridge, Inria, FAIR at Meta

Paris
Joined January 2023
Don't wanna be here? Send us removal request.
@arnal_charles
Charles Arnal
2 days
TL;DR: Our work deciphers what makes tool use so effective for LLMs, improving our understanding of its widely observed practical benefits. πŸ“œ πŸ–₯️ 🧡/🧡
Tweet media one
0
0
0
@arnal_charles
Charles Arnal
2 days
🐳We then scale our experiments by finetuning Llama3 and SmolLM instruct models and show that introducing new knowledge with in-weight learning severely impacts models' existing capabilities, while tool-augmented learning promises scalability without forgetting. 10/🧡
Tweet media one
1
0
0
@arnal_charles
Charles Arnal
2 days
πŸ”ŽWe validate our theoretical insights in a controlled setting by pretraining small Llama-3 models from scratch using in-weight and in-tool learning on factual databases. Tool-augmented recall outperforms in-weight memorization in terms of parameter requirements. 9/🧡
Tweet media one
1
0
0
@arnal_charles
Charles Arnal
2 days
πŸŽ“Theory: 1) We demonstrate that the number of facts a model can store in its weights is fundamentally limited by its number of parameters; 2) We derive an upper bound proving that tool-augmented models can, in principle, retrieve an unbounded number of facts. 8/🧡.
1
0
0
@arnal_charles
Charles Arnal
2 days
πŸ‹πŸ½In-weight learning: the model is trained to directly generate the answer from its parameters. πŸ› οΈIn-tool learning: the model learns to issue a structured tool query that retrieves the value from an external database. 7/🧡.
1
0
0
@arnal_charles
Charles Arnal
2 days
To highlight the differences between in-weight memorization and tool-augmented reasoning, we introduce a family of factual recall tasks inspired by Physics of LLMs (@ZeyuanAllenZhu), where datasets are finite collections of facts to be retrieved upon query. 6/🧡
Tweet media one
1
0
0
@arnal_charles
Charles Arnal
2 days
While the former is bounded by the model’s capacity and sensitive to forgetting, the latter offers the potential for open-ended knowledge access and generalization. In our work, we provide a rigorous theoretical framework for understanding the benefits of tool use for LLMs. 5/🧡.
1
0
0
@arnal_charles
Charles Arnal
2 days
These capabilities mark a shift away from 𝐒𝐧-𝐰𝐞𝐒𝐠𝐑𝐭 π₯𝐞𝐚𝐫𝐧𝐒𝐧𝐠 (memorizing the solution to a problem within the model's weights) towards 𝐒𝐧-𝐭𝐨𝐨π₯ π₯𝐞𝐚𝐫𝐧𝐒𝐧𝐠 (learning to use a tool, e.g., a calculator or a request to a database, to solve a problem). 4/🧡
Tweet media one
1
0
0
@arnal_charles
Charles Arnal
2 days
Recently, LLMs evolved from static predictors into dynamic agents capable of reasoning, adapting, and acting over time. This has been allowed by advances in architecture and interaction design like RAG (@PSH_Lewis et al., 2021) or ToolFormer (@timo_schick et al., 2023). 3/🧡.
1
0
0
@arnal_charles
Charles Arnal
2 days
πŸ€— Joint work with @AmbroiseOdonnat, Sam Houliston and Vivien Cabannes at @ETH_en and @AIatMeta. 2/🧡.
1
0
0
@arnal_charles
Charles Arnal
2 days
πŸ€”Why is tool use so effective for LLMs? . In our new work, we provide theoretical and empirical evidence that tool-augmented workflows are not just practical but also provably more scalable. πŸ“œπŸ–₯️ 1/🧡
Tweet media one
1
1
3
@arnal_charles
Charles Arnal
2 months
RT @KempeLab: Black-box Optimization for LLM Post-Training πŸ’ͺ.Strong non-vacuous generalization bounds βœ”οΈ.Privacy by design βœ”οΈ.Robustness to….
0
11
0
@arnal_charles
Charles Arnal
2 months
RT @gary_shiu: Excited to share with you an exciting project with Jacky Yip @UWMadPhysics and @arnal_charles and @f_charton @Meta where we….
0
5
0
@arnal_charles
Charles Arnal
2 months
Shout out to @syhw, @jadecopet, @TacoCohen, @KunhaoZ, @FabianGloeckle and @PierreChambon6 for their help with the code!.
0
0
4
@arnal_charles
Charles Arnal
2 months
(8/8) Our paper also offers a complete theoretical analysis of these phenomena in a simplified setting πŸ“–, along with experiments in a controlled bandits setup that illustrate our findings.
Tweet media one
1
0
2
@arnal_charles
Charles Arnal
2 months
(7/8) In other words, one should learn more from others’ successes than from their mistakes.
1
0
2
@arnal_charles
Charles Arnal
2 months
(6/8) Our experiments show that V < 0 (slightly more emphasis on good trajectories) leads to stable & efficient training in the off-policy setting, while letting V be 0 or positive leads to crashes:
Tweet media one
1
0
3
@arnal_charles
Charles Arnal
2 months
(5/8) Our solution: **Asymmetric REINFORCE** (AsymRE). We add a reward baseline V:.- V < 0: More emphasis on rewarding good trajectories. - V > 0: More emphasis on punishing bad trajectories.
Tweet media one
1
0
2
@arnal_charles
Charles Arnal
2 months
(4/8) However, standard REINFORCE (the simplest RL loss) often leads to instability & crashes in off-policy settings!
Tweet media one
1
0
1
@arnal_charles
Charles Arnal
2 months
(3/8) Why off-policy RL? It's often simpler to implement than on-policy, especially with delayed rewards, & offers potential for greater data efficiency by allowing multiple passes over data.
1
0
1