canondetortugas Profile Banner
Dylan Foster 🐢 Profile
Dylan Foster 🐢

@canondetortugas

Followers
3K
Following
425
Media
46
Statuses
262

Foundations of RL/AI @MSFTResearch. Previously @MIT @Cornell_CS RL Theory Lecture Notes: https://t.co/bhgL3aKIk0

Joined January 2012
Don't wanna be here? Send us removal request.
@canondetortugas
Dylan Foster 🐢
1 year
Now that I have started using twitter somewhat regularly, let me take a minute to advertise the RL theory lecture notes I have been developing with Sasha Rakhlin:  https://t.co/x16aGvE4tr
5
89
639
@canondetortugas
Dylan Foster 🐢
4 hours
Really nice set of results from Yuda and Dhruv! Great step toward a deeper understanding of the tradeoffs of sim-to-real transfer
@yus167
Yuda Song
17 hours
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
0
3
17
@JensTuyls
Jens Tuyls
20 hours
7/ For post-training, we compare the test-time sample efficiency improvement for pass@256 of RepExp over GRPO (blue) and Unlikeliness (orange), an exploration baseline. RepExp is 2.1-4.1x more sample efficient than Unlikeliness and 3.2-13.4x more sample efficient than GRPO.
1
2
3
@JensTuyls
Jens Tuyls
20 hours
6c/ Finding #4: RepExp improves verifier efficiency over standard generation modifications
1
2
3
@JensTuyls
Jens Tuyls
20 hours
6b/ Finding #2: The benefits of RepExp grow with model strength (left) Finding #3: RepExp provides more improvement for harder questions (right)
1
2
3
@canondetortugas
Dylan Foster 🐢
20 hours
Really excited about this new paper with Jens! I believe exploration (beyond being a topic that is close to my heart) is a super promising direction for language modeling as we look toward systems/agents that can design their own data
@JensTuyls
Jens Tuyls
20 hours
Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!
1
5
75
@JensTuyls
Jens Tuyls
20 hours
Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!
1
18
76
@risteski_a
Andrej Risteski
1 day
With awesome team: Dhruv Rohatgi, Abhishek Shetty (@AShettyV), Donya Saless (@DonyaSaless), Yuchen Li ( @_Yuchen_Li_), Ankur Moitra, and Dylan Foster (@canondetortugas). Dhruv and Yuchen are both on the (postdoc & job) market this year --- grab them while you can !!
2
2
4
@risteski_a
Andrej Risteski
1 day
I have been thinking a lot recently about framing a variety of inference-time tasks as doing algorithm design with access to strong oracles (e.g. generators, different types of verifiers, convolved scores, ...) --- as an alternative to "end-to-end" analyses.
@canondetortugas
Dylan Foster 🐢
4 days
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
2
7
41
@canondetortugas
Dylan Foster 🐢
4 days
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
8
38
240
@canondetortugas
Dylan Foster 🐢
4 days
Lots of interesting directions here! We think there is a lot more to do building on the connection to the discrete sampling/TCS literature and algos from this space, as well as moving beyond autoregressive generation.
1
0
7
@canondetortugas
Dylan Foster 🐢
4 days
Empirically, we have only tried this w/ small-ish scale so far, but find consistently that VGB outperforms textbook algos on either (1) accuracy; or (2) diversity when compute-normalized. Ex: for Dyck language, VGB escapes the accuracy-diversity frontier for baselines algos.
1
1
7
@canondetortugas
Dylan Foster 🐢
4 days
Main guarantee: - As long as you have exact/verifiable outcome rewards, always converges to optimal distribution. - Runtime depends on process verifier quality, gracefully degrading as quality gets worse.
1
3
8
@canondetortugas
Dylan Foster 🐢
4 days
VGB generalizes the Sinclair–Jerrum '89 random walk ( https://t.co/hTjxI5W2NA) from TCS (used to prove equivalence of apx. counting & sampling for self-reducible problems), linking test-time RL/alignment with discrete sampling theory. We are super excited about this connection.
1
2
13
@canondetortugas
Dylan Foster 🐢
4 days
We give a new algo, Value-Guided Backtracking (VGB), where the idea is to view autoregressive generation as a random walk on the tree of partial outputs, and add a *stochastic backtracking* step—occasionally erasing tokens in a principled way—to counter error amplification.
1
2
13
@canondetortugas
Dylan Foster 🐢
4 days
Test-time guidance with learned process verifiers has potential to enhance LLM reasoning, but one of the issues with getting this to actually work is that small verifier mistakes are amplified by textbook algos (e.g., block-wise BoN), w/ errors compounding as length increases.
1
2
8
@canondetortugas
Dylan Foster 🐢
4 days
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
8
38
240
@nmboffi
Nicholas Boffi
8 days
Consistency models, CTMs, shortcut models, align your flow, mean flow... What's the connection, and how should you learn them in practice? We show they're all different sides of the same coin connected by one central object: the flow map. https://t.co/QBp1kELVhF 🧵(1/n)
5
68
336
@canondetortugas
Dylan Foster 🐢
9 days
Website for details/updates:
0
0
7