Bryan Chan
@chanpyb
Followers
185
Following
4K
Media
1
Statuses
50
PhD student @rlai_lab. Prev: @GoogleDeepMind, @OcadoTechnology, @kindredai, @UofTCompSci
Edmonton
Joined October 2020
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! https://t.co/GrDfgzW1fL
34
474
2K
Excited to share that our work on understanding when ICL emerges has been accepted to #ICLR2025 ! Submission for preview:
openreview.net
It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also...
LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N
0
0
4
Thanks @m_wulfmeier ! We were surprised to see that SAC-X is just very robust. Something that was interesting to us that we didn’t further investigate: Learning from examples ended up being more efficient than using reward. Let’s chat at #NeurIPS2024 if there’s a chance?
Here's a fascinating paper by @domo_mr_roboto's group linking hierarchical reinforcement learning and cheaply-obtainable auxiliary tasks https://t.co/n7dC8ifUNr Better exploration with minimal engineering effort remains a critical challenge (even for RLHF/AIF) - reminiscent of
0
0
6
Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.
9
106
629
Our NeurIPS paper is now on arXiv: We introduce Action Value Gradient (AVG), a novel incremental deep RL method that learns in real-time, one sample at a time — no batch updates, target networks or a replay buffer! Co-authors @mhmd_elsaye @bellingerc @white_martha @rupammahmood
2
23
94
Hey all! We are thrilled to have @chanpyb from @UAlberta for this week's seminar! The talk is titled: "Why can't we use reinforcement learning for image-based robotic manipulation?". See you at 11:30AM ET! https://t.co/vki05SSZgx
#rl #manipulation, #imitationLearning
0
4
35
@XinyiChen2 I think this line of work will lead us to a better understanding of how LLMs work and further lead us to new ideas in designing training algorithms for various LLMs. N/N Arxiv Link:
arxiv.org
It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also...
0
0
1
@XinyiChen2 Of course, we have also conducted experiments on a synthetic dataset, Omniglot, and fine-tuned a LLM with a small number of prompts to corroborate our theoretical findings. 7/N
1
0
1
@XinyiChen2 In practice the models don’t know the test errors. To bridge this gap we provide a regret analysis, showing that training samples observed at every iteration can be seen as a test sample, so its loss can provide a quantity similar to that of the test error. 6/N
1
0
0
@XinyiChen2 When we see each data point sufficiently enough times, the model will eventually perform IWL only, demonstrating the transience of ICL! This characterization also suggests that in some cases ICL is never transient because IWL is more erroneous compared to ICL. 5/N
1
0
0
@XinyiChen2 With imbalanced datasets, we can expect the model to exhibit ICL on rare classes in the beginning while the model exhibits IWL on common classes, showing that a model can perform both ICL and IWL simultaneously. 4/N
1
0
0
@XinyiChen2 Our result suggests that the model will perform ICL or IWL based on their corresponding test errors! In short, the model performs ICL for data points that appear rarely and are predictable from the context, while the IWL happens for data points that appear frequently. 3/N
1
0
0
In this work with my collaborators @XinyiChen2, András György, and Dale Schuurmans, we provide a theory to characterize the emergence and transience of ICL through a simplified model. 2/N
1
0
1
LLMs can leverage context information, i.e., in-context learning (ICL) or memorize solutions, i.e., in-weight learning (IWL) for prediction, but when do they happen? 1/N
1
0
1
Finally, what's cool about this is that we actually compared these algorithms over multiple seeds, which many papers don't do when it comes to real-life robotic experiments! 7/N
1
0
1
With this simple change, we can do hybrid RL with just 50 human demonstrations, and this agent can achieve around 75% with just 20 minutes of interaction time. With the same amount of data BC can't even achieve this performance! 6/N
1
0
0
Well then, the regularizer is essentially decorrelating the latent representation, which also allows us to totally remove the target network because it was somewhat introduced to decorrelate consecutive state-action pairs 5/N
1
0
0
This can actually be explained through the learning dynamics of Q-function with TD-learning. Previous works have looked at this through neural tangent kernel and found that the similarity of state-action pairs dictates the change in all Q-values after a SGD step 4/N
1
0
0