Dylan Foster 🐢 @canondetortugas X Profile

Dylan Foster 🐢

@canondetortugas

Followers

3K

Following

425

Media

46

Statuses

262

Foundations of RL/AI @MSFTResearch. Previously @MIT @Cornell_CS RL Theory Lecture Notes: https://t.co/bhgL3aKIk0

https://t.co/MPTSw7PUGY

Joined January 2012

Don't wanna be here? Send us removal request.

Dylan Foster 🐢

@canondetortugas

1 year

Now that I have started using twitter somewhat regularly, let me take a minute to advertise the RL theory lecture notes I have been developing with Sasha Rakhlin: https://t.co/x16aGvE4tr

5

89

639

Dylan Foster 🐢

@canondetortugas

4 hours

Really nice set of results from Yuda and Dhruv! Great step toward a deeper understanding of the tradeoffs of sim-to-real transfer

Yuda Song

@yus167

17 hours

🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)

0

3

17

Jens Tuyls

@JensTuyls

20 hours

7/ For post-training, we compare the test-time sample efficiency improvement for pass@256 of RepExp over GRPO (blue) and Unlikeliness (orange), an exploration baseline. RepExp is 2.1-4.1x more sample efficient than Unlikeliness and 3.2-13.4x more sample efficient than GRPO.

1

2

3

Jens Tuyls

@JensTuyls

20 hours

6c/ Finding #4: RepExp improves verifier efficiency over standard generation modifications

1

2

3

Jens Tuyls

@JensTuyls

20 hours

6b/ Finding #2: The benefits of RepExp grow with model strength (left) Finding #3: RepExp provides more improvement for harder questions (right)

1

2

3

Dylan Foster 🐢

@canondetortugas

20 hours

Really excited about this new paper with Jens! I believe exploration (beyond being a topic that is close to my heart) is a super promising direction for language modeling as we look toward systems/agents that can design their own data

Jens Tuyls

@JensTuyls

20 hours

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!

1

5

75

Jens Tuyls

@JensTuyls

20 hours

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!

1

18

76

Jens Tuyls

@JensTuyls

20 hours

9/ With fantastic collaborators Dylan Foster (@canondetortugas), Akshay Krishnamurthy, and Jordan Ash (@jordan_t_ash) Paper: https://t.co/Q6rzJO9bMf Website: https://t.co/8gmGNvO9IK Code: coming soon!

arxiv.org

Reinforcement learning (RL) promises to expand the capabilities of language models, but it is unclear if current RL techniques promote the discovery of novel behaviors, or simply sharpen those...

0

2

6

Andrej Risteski

@risteski_a

1 day

With awesome team: Dhruv Rohatgi, Abhishek Shetty (@AShettyV), Donya Saless (@DonyaSaless), Yuchen Li ( @_Yuchen_Li_), Ankur Moitra, and Dylan Foster (@canondetortugas). Dhruv and Yuchen are both on the (postdoc & job) market this year --- grab them while you can !!

2

4

Andrej Risteski

@risteski_a

1 day

I have been thinking a lot recently about framing a variety of inference-time tasks as doing algorithm design with access to strong oracles (e.g. generators, different types of verifiers, convolved scores, ...) --- as an alternative to "end-to-end" analyses.

Dylan Foster 🐢

@canondetortugas

4 days

New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.

2

7

41

Dylan Foster 🐢

@canondetortugas

4 days

New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.

8

38

240

Dylan Foster 🐢

@canondetortugas

4 days

With amazing team: Dhruv Rohatgi, Abhishek Shetty (@AShettyV), Donya Saless (@DonyaSaless), Yuchen Li (@_Yuchen_Li_), Ankur Moitra, and Andrej Risteski (@risteski_a). Paper link:

arxiv.org

Test-time algorithms that combine the generative power of language models with process verifiers that assess the quality of partial generations offer a promising lever for eliciting new reasoning...

0

4

28

Dylan Foster 🐢

@canondetortugas

4 days

Lots of interesting directions here! We think there is a lot more to do building on the connection to the discrete sampling/TCS literature and algos from this space, as well as moving beyond autoregressive generation.

1

0

7

Dylan Foster 🐢

@canondetortugas

4 days

Empirically, we have only tried this w/ small-ish scale so far, but find consistently that VGB outperforms textbook algos on either (1) accuracy; or (2) diversity when compute-normalized. Ex: for Dyck language, VGB escapes the accuracy-diversity frontier for baselines algos.

1

7

Dylan Foster 🐢

@canondetortugas

4 days

Main guarantee: - As long as you have exact/verifiable outcome rewards, always converges to optimal distribution. - Runtime depends on process verifier quality, gracefully degrading as quality gets worse.

1

3

8

Dylan Foster 🐢

@canondetortugas

4 days

VGB generalizes the Sinclair–Jerrum '89 random walk ( https://t.co/hTjxI5W2NA) from TCS (used to prove equivalence of apx. counting & sampling for self-reducible problems), linking test-time RL/alignment with discrete sampling theory. We are super excited about this connection.

1

2

13

Dylan Foster 🐢

@canondetortugas

4 days

We give a new algo, Value-Guided Backtracking (VGB), where the idea is to view autoregressive generation as a random walk on the tree of partial outputs, and add a *stochastic backtracking* step—occasionally erasing tokens in a principled way—to counter error amplification.

1

2

13

Dylan Foster 🐢

@canondetortugas

4 days

Test-time guidance with learned process verifiers has potential to enhance LLM reasoning, but one of the issues with getting this to actually work is that small verifier mistakes are amplified by textbook algos (e.g., block-wise BoN), w/ errors compounding as length increases.

1

2

8

Dylan Foster 🐢

@canondetortugas

4 days

New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.

8

38

240

Nicholas Boffi

@nmboffi

8 days

Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. https://t.co/QBp1kELVhF 🧵(1/n)

5

68

336

Dylan Foster 🐢

@canondetortugas

9 days

Website for details/updates:

0

7