Charles Arnal @arnal_charles X Profile

Charles Arnal

@arnal_charles

Followers

91

Following

71

Media

11

Statuses

24

Postdoc at @MetaAI, mathematician ENS, Cambridge, Inria, FAIR at Meta

Paris

Joined January 2023

Don't wanna be here? Send us removal request.

Charles Arnal

@arnal_charles

2 days

TL;DR: Our work deciphers what makes tool use so effective for LLMs, improving our understanding of its widely observed practical benefits. 📜 🖥️ 🧵/🧵

0

Charles Arnal

@arnal_charles

2 days

🐳We then scale our experiments by finetuning Llama3 and SmolLM instruct models and show that introducing new knowledge with in-weight learning severely impacts models' existing capabilities, while tool-augmented learning promises scalability without forgetting. 10/🧵

1

0

Charles Arnal

@arnal_charles

2 days

🔎We validate our theoretical insights in a controlled setting by pretraining small Llama-3 models from scratch using in-weight and in-tool learning on factual databases. Tool-augmented recall outperforms in-weight memorization in terms of parameter requirements. 9/🧵

1

0

Charles Arnal

@arnal_charles

2 days

🎓Theory: 1) We demonstrate that the number of facts a model can store in its weights is fundamentally limited by its number of parameters; 2) We derive an upper bound proving that tool-augmented models can, in principle, retrieve an unbounded number of facts. 8/🧵.

1

0

Charles Arnal

@arnal_charles

2 days

🏋🏽In-weight learning: the model is trained to directly generate the answer from its parameters. 🛠️In-tool learning: the model learns to issue a structured tool query that retrieves the value from an external database. 7/🧵.

1

0

Charles Arnal

@arnal_charles

2 days

To highlight the differences between in-weight memorization and tool-augmented reasoning, we introduce a family of factual recall tasks inspired by Physics of LLMs (@ZeyuanAllenZhu), where datasets are finite collections of facts to be retrieved upon query. 6/🧵

1

0

Charles Arnal

@arnal_charles

2 days

While the former is bounded by the model’s capacity and sensitive to forgetting, the latter offers the potential for open-ended knowledge access and generalization. In our work, we provide a rigorous theoretical framework for understanding the benefits of tool use for LLMs. 5/🧵.

1

0

Charles Arnal

@arnal_charles

2 days

These capabilities mark a shift away from 𝐢𝐧-𝐰𝐞𝐢𝐠𝐡𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 (memorizing the solution to a problem within the model's weights) towards 𝐢𝐧-𝐭𝐨𝐨𝐥 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 (learning to use a tool, e.g., a calculator or a request to a database, to solve a problem). 4/🧵

1

0

Charles Arnal

@arnal_charles

2 days

Recently, LLMs evolved from static predictors into dynamic agents capable of reasoning, adapting, and acting over time. This has been allowed by advances in architecture and interaction design like RAG (@PSH_Lewis et al., 2021) or ToolFormer (@timo_schick et al., 2023). 3/🧵.

1

0

Charles Arnal

@arnal_charles

2 days

🤗 Joint work with @AmbroiseOdonnat, Sam Houliston and Vivien Cabannes at @ETH_en and @AIatMeta. 2/🧵.

1

0

Charles Arnal

@arnal_charles

2 days

🤔Why is tool use so effective for LLMs? . In our new work, we provide theoretical and empirical evidence that tool-augmented workflows are not just practical but also provably more scalable. 📜🖥️ 1/🧵

1

3

Charles Arnal

@arnal_charles

2 months

RT @KempeLab: Black-box Optimization for LLM Post-Training 💪.Strong non-vacuous generalization bounds ✔️.Privacy by design ✔️.Robustness to….

0

11

0

Charles Arnal

@arnal_charles

2 months

RT @gary_shiu: Excited to share with you an exciting project with Jacky Yip @UWMadPhysics and @arnal_charles and @f_charton @Meta where we….

0

5

0

Charles Arnal

@arnal_charles

2 months

Shout out to @syhw, @jadecopet, @TacoCohen, @KunhaoZ, @FabianGloeckle and @PierreChambon6 for their help with the code!.

0

4

Charles Arnal

@arnal_charles

2 months

(8/8) Our paper also offers a complete theoretical analysis of these phenomena in a simplified setting 📖, along with experiments in a controlled bandits setup that illustrate our findings.

1

0

2

Charles Arnal

@arnal_charles

2 months

(7/8) In other words, one should learn more from others’ successes than from their mistakes.

1

0

2

Charles Arnal

@arnal_charles

2 months

(6/8) Our experiments show that V < 0 (slightly more emphasis on good trajectories) leads to stable & efficient training in the off-policy setting, while letting V be 0 or positive leads to crashes:

1

0

3

Charles Arnal

@arnal_charles

2 months

(5/8) Our solution: **Asymmetric REINFORCE** (AsymRE). We add a reward baseline V:.- V < 0: More emphasis on rewarding good trajectories. - V > 0: More emphasis on punishing bad trajectories.

1

0

2

Charles Arnal

@arnal_charles

2 months

(4/8) However, standard REINFORCE (the simplest RL loss) often leads to instability & crashes in off-policy settings!

1

0

1

Charles Arnal

@arnal_charles

2 months

(3/8) Why off-policy RL? It's often simpler to implement than on-policy, especially with delayed rewards, & offers potential for greater data efficiency by allowing multiple passes over data.

1

0

1