Kamil Ciosek @MLciosek X Profile

Kamil Ciosek

@MLciosek

Followers

466

Following

280

Media

1

Statuses

27

Research Scientist @Spotify. A machine learning generalist, with an interest in LLMs and a focus on RL(HF).

https://t.co/0Ut0aIdWhg

Joined December 2014

Don't wanna be here? Send us removal request.

Kamil Ciosek

@MLciosek

2 months

For anyone worried their LLM might be making stuff up, we made a budget‐friendly truth serum (semantic entropy + Bayesian). See for yourself: https://t.co/gq8oFP5Eqr Paper:

0

7

3

Zhenwen Dai

@ZhenwenDai

3 years

Interested in working on exciting ML ideas for Spotify? Join us! We are looking for a Research Scientist Intern to join our research lab in London for summer 2023. https://t.co/62V4NvEddS @SpotifyResearch

1

3

20

Kamil Ciosek

@MLciosek

3 years

On the other hand, in “traditional” dynamic programming we have to carefully design the order of updates, which is problem-specific. (3/3)

0

3

Kamil Ciosek

@MLciosek

3 years

In fact, these meanings are closely related. Q-learning is a type of dynamic programming. The main insight is that, in Q-learning, updates can be made in a random order, but we are still able to solve arbitrary MDPs. (2/3)

1

0

3

Kamil Ciosek

@MLciosek

3 years

People sometimes say that the term “dynamic programming” has two distinct meanings. The first meaning is what you typically encounter in an algorithms textbook, for things like Levenshtein distance etc. The second meaning is about RL and things like Q-learning. (1/3)

1

2

6

Kritika Prakash

@kritipraks

4 years

Kullback-Leibler divergence is not the same as Leibler-Kullback divergence

49

285

3K

Kamil Ciosek

@MLciosek

3 years

The competition aspect drives people to do well and improves science, but may not be the most optimal way to do so. Outcomes are also often pretty noisy. @TmlrOrg looks like an interesting alternative path. (2/2)

0

1

Kamil Ciosek

@MLciosek

3 years

Since acceptance rate in major conferences is approximately constant ( https://t.co/EOIRkXYCAt), conferences are in a way zero-sum games. #NeurIPS2022 #iclr2022. (1/2)

github.com

Acceptance rates for the major AI conferences. Contribute to lixin4ever/Conference-Acceptance-Rate development by creating an account on GitHub.

1

0

1

Spotify Research

@SpotifyResearch

4 years

Want to do imitation learning in a simple and efficient way? We released code for the ICLR 2022 paper “Imitation Learning by Reinforcement Learning”. See https://t.co/pL8beWndyR.

github.com

Source code for the paper "Imitation Learning by Reinforcement Learning" (ICLR 2022). - spotify-research/il-by-rl

0

1

10

Spotify Research

@SpotifyResearch

4 years

Interested in Imitation Learning? You can do it using a single call to a Reinforcement Learning oracle. See our #ICLR2022 paper “Imitation Learning by Reinforcement Learning” ( https://t.co/o8X6UIMO7n).

0

3

23

David Lindner

@davlindner

4 years

I'm excited to present our work on active reward learning at #NeurIPS2021! We propose a general way to make queries that are informative about the optimal policy. Joint work with @MatteoTurchetta, Sebastian Tschiatschek, @MLciosek, and @arkrause: https://t.co/wSQLoWppHm 👇(1/6)

2

14

Kamil Ciosek

@MLciosek

6 years

Like policy gradients? In "Expected Policy Gradients for Reinforcement Learning", we study various quadrature schemes to decrease variance in gradient estimates. Final version is now published in JMLR (@JmlrOrg). See https://t.co/a9gljfYJDf.

0

12

53

Tristan Deleu

@TristanDeleu

7 years

Our work on the reproducibility of meta-RL baselines (Bandits + MDPs) with MAML and Reptile is at the Reproducibility in ML workshop (C2) #ICML2018 together with @arianTBD & Simon Guiroy @MILAMontreal

2

21

88

WhiRL

@whi_rl

8 years

Learn about our work on "Expected Policy Gradients" in this 13min video by Kamil Ciosek - with @MLciosek @shimon8282 https://t.co/G296BlOS48

0

4

12

WhiRL

@whi_rl

8 years

"Fourier Policy Gradients" by Matthew Fellows, Kamil Ciosek and Shimon Whiteson https://t.co/v2Q5xd9HLh

0

6

10

Shimon Whiteson

@shimon8282

8 years

Our new paper: using Fourier analysis to derive policy gradients: we recast the integrals as convolutions, which a Fourier transform turns into multiplications. The resulting analysis unifies existing policy gradient results.

arxiv.org

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as...

1

24

103