Kamil Ciosek
@MLciosek
Followers
466
Following
280
Media
1
Statuses
27
Research Scientist @Spotify. A machine learning generalist, with an interest in LLMs and a focus on RL(HF).
Joined December 2014
For anyone worried their LLM might be making stuff up, we made a budget‐friendly truth serum (semantic entropy + Bayesian). See for yourself: https://t.co/gq8oFP5Eqr Paper:
0
7
3
Interested in working on exciting ML ideas for Spotify? Join us! We are looking for a Research Scientist Intern to join our research lab in London for summer 2023. https://t.co/62V4NvEddS
@SpotifyResearch
1
3
20
On the other hand, in “traditional” dynamic programming we have to carefully design the order of updates, which is problem-specific. (3/3)
0
0
3
In fact, these meanings are closely related. Q-learning is a type of dynamic programming. The main insight is that, in Q-learning, updates can be made in a random order, but we are still able to solve arbitrary MDPs. (2/3)
1
0
3
People sometimes say that the term “dynamic programming” has two distinct meanings. The first meaning is what you typically encounter in an algorithms textbook, for things like Levenshtein distance etc. The second meaning is about RL and things like Q-learning. (1/3)
1
2
6
Kullback-Leibler divergence is not the same as Leibler-Kullback divergence
49
285
3K
Since acceptance rate in major conferences is approximately constant ( https://t.co/EOIRkXYCAt), conferences are in a way zero-sum games. #NeurIPS2022 #iclr2022. (1/2)
github.com
Acceptance rates for the major AI conferences. Contribute to lixin4ever/Conference-Acceptance-Rate development by creating an account on GitHub.
1
0
1
Want to do imitation learning in a simple and efficient way? We released code for the ICLR 2022 paper “Imitation Learning by Reinforcement Learning”. See https://t.co/pL8beWndyR.
github.com
Source code for the paper "Imitation Learning by Reinforcement Learning" (ICLR 2022). - spotify-research/il-by-rl
0
1
10
Interested in Imitation Learning? You can do it using a single call to a Reinforcement Learning oracle. See our #ICLR2022 paper “Imitation Learning by Reinforcement Learning” ( https://t.co/o8X6UIMO7n).
0
3
23
I'm excited to present our work on active reward learning at #NeurIPS2021! We propose a general way to make queries that are informative about the optimal policy. Joint work with @MatteoTurchetta, Sebastian Tschiatschek, @MLciosek, and @arkrause: https://t.co/wSQLoWppHm 👇(1/6)
2
2
14
Like policy gradients? In "Expected Policy Gradients for Reinforcement Learning", we study various quadrature schemes to decrease variance in gradient estimates. Final version is now published in JMLR (@JmlrOrg). See https://t.co/a9gljfYJDf.
0
12
53
Our work on the reproducibility of meta-RL baselines (Bandits + MDPs) with MAML and Reptile is at the Reproducibility in ML workshop (C2) #ICML2018 together with @arianTBD & Simon Guiroy @MILAMontreal
2
21
88
Learn about our work on "Expected Policy Gradients" in this 13min video by Kamil Ciosek - with @MLciosek @shimon8282
https://t.co/G296BlOS48
0
4
12
"Fourier Policy Gradients" by Matthew Fellows, Kamil Ciosek and Shimon Whiteson https://t.co/v2Q5xd9HLh
0
6
10
Our new paper: using Fourier analysis to derive policy gradients: we recast the integrals as convolutions, which a Fourier transform turns into multiplications. The resulting analysis unifies existing policy gradient results.
arxiv.org
We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as...
1
24
103