Kamil Ciosek Profile
Kamil Ciosek

@MLciosek

Followers
466
Following
280
Media
1
Statuses
27

Research Scientist @Spotify. A machine learning generalist, with an interest in LLMs and a focus on RL(HF).

Joined December 2014
Don't wanna be here? Send us removal request.
@MLciosek
Kamil Ciosek
2 months
For anyone worried their LLM might be making stuff up, we made a budget‐friendly truth serum (semantic entropy + Bayesian). See for yourself: https://t.co/gq8oFP5Eqr Paper:
0
7
3
@ZhenwenDai
Zhenwen Dai
3 years
Interested in working on exciting ML ideas for Spotify? Join us! We are looking for a Research Scientist Intern to join our research lab in London for summer 2023. https://t.co/62V4NvEddS @SpotifyResearch
1
3
20
@MLciosek
Kamil Ciosek
3 years
On the other hand, in “traditional” dynamic programming we have to carefully design the order of updates, which is problem-specific. (3/3)
0
0
3
@MLciosek
Kamil Ciosek
3 years
In fact, these meanings are closely related. Q-learning is a type of dynamic programming. The main insight is that, in Q-learning, updates can be made in a random order, but we are still able to solve arbitrary MDPs. (2/3)
1
0
3
@MLciosek
Kamil Ciosek
3 years
People sometimes say that the term “dynamic programming” has two distinct meanings. The first meaning is what you typically encounter in an algorithms textbook, for things like Levenshtein distance etc. The second meaning is about RL and things like Q-learning. (1/3)
1
2
6
@kritipraks
Kritika Prakash
4 years
Kullback-Leibler divergence is not the same as Leibler-Kullback divergence
49
285
3K
@MLciosek
Kamil Ciosek
3 years
The competition aspect drives people to do well and improves science, but may not be the most optimal way to do so. Outcomes are also often pretty noisy. @TmlrOrg looks like an interesting alternative path. (2/2)
0
0
1
@MLciosek
Kamil Ciosek
3 years
Since acceptance rate in major conferences is approximately constant ( https://t.co/EOIRkXYCAt), conferences are in a way zero-sum games. #NeurIPS2022 #iclr2022. (1/2)
github.com
Acceptance rates for the major AI conferences. Contribute to lixin4ever/Conference-Acceptance-Rate development by creating an account on GitHub.
1
0
1
@SpotifyResearch
Spotify Research
4 years
Want to do imitation learning in a simple and efficient way? We released code for the ICLR 2022 paper “Imitation Learning by Reinforcement Learning”. See https://t.co/pL8beWndyR.
Tweet card summary image
github.com
Source code for the paper "Imitation Learning by Reinforcement Learning" (ICLR 2022). - spotify-research/il-by-rl
0
1
10
@SpotifyResearch
Spotify Research
4 years
Interested in Imitation Learning? You can do it using a single call to a Reinforcement Learning oracle. See our #ICLR2022 paper “Imitation Learning by Reinforcement Learning” ( https://t.co/o8X6UIMO7n).
0
3
23
@davlindner
David Lindner
4 years
I'm excited to present our work on active reward learning at #NeurIPS2021! We propose a general way to make queries that are informative about the optimal policy. Joint work with @MatteoTurchetta, Sebastian Tschiatschek, @MLciosek, and @arkrause: https://t.co/wSQLoWppHm 👇(1/6)
2
2
14
@MLciosek
Kamil Ciosek
6 years
Like policy gradients? In "Expected Policy Gradients for Reinforcement Learning", we study various quadrature schemes to decrease variance in gradient estimates. Final version is now published in JMLR (@JmlrOrg). See https://t.co/a9gljfYJDf.
0
12
53
@TristanDeleu
Tristan Deleu
7 years
Our work on the reproducibility of meta-RL baselines (Bandits + MDPs) with MAML and Reptile is at the Reproducibility in ML workshop (C2) #ICML2018 together with @arianTBD & Simon Guiroy @MILAMontreal
2
21
88
@whi_rl
WhiRL
8 years
Learn about our work on "Expected Policy Gradients" in this 13min video by Kamil Ciosek - with @MLciosek @shimon8282 https://t.co/G296BlOS48
0
4
12
@whi_rl
WhiRL
8 years
"Fourier Policy Gradients" by Matthew Fellows, Kamil Ciosek and Shimon Whiteson https://t.co/v2Q5xd9HLh
0
6
10
@shimon8282
Shimon Whiteson
8 years
Our new paper: using Fourier analysis to derive policy gradients: we recast the integrals as convolutions, which a Fourier transform turns into multiplications. The resulting analysis unifies existing policy gradient results.
Tweet card summary image
arxiv.org
We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as...
1
24
103