Kevin Patrick Murphy @sirbayes X Profile

Kevin Patrick Murphy

@sirbayes

Followers

65K

Following

4K

Media

103

Statuses

1K

Research Scientist at Google DeepMind. Interested in Bayesian Machine Learning.

https://t.co/VrW1Ar1VnX

Joined October 2016

Don't wanna be here? Send us removal request.

Kevin Patrick Murphy

@sirbayes

11 days

I am pleased to announce another update to my RL tutorial ( https://t.co/SjMdabksJo). This time I have added code for RLFT for multi-turn LLM agents, using the awesome Tinker library from @thinkymachines, and the simple ReBN training loop from GEM by @zzlccc et al. With ~100

15

151

1K

Kevin Patrick Murphy

@sirbayes

12 days

This is a cool Julia version of my Jax library for Bayesian Structural Time Series modeling ( https://t.co/T5gQOQPN9o) from the folks at @ReactiveBayes. It can easily handle non-linear and non-conjugate likelihoods (eg Poisson distribution for integer count observations). For

ReactiveBayes

@ReactiveBayes

19 days

New Bayesian Structured Time Series example: predicting taco demand during #NeurIPS 2025 ⚡️ Learnable Dynamics 🔢 Non-Conjugacies 🏎️ Blazing Speed https://t.co/t23bDp7Lab #JuliaLang #BayesianInference #DataScience

1

21

217

Kevin Patrick Murphy

@sirbayes

27 days

This is a much more intuitive version of the prisoners dilemma from Jon Kleinberg’s excellent book, https://t.co/EsoYlFhGws, that covers the holy trifecta that Michael Jordan keeps discussing, namely Econ + CS + stats. (Very relevant for a future with human and AI agents …)

3

46

349

Kevin Patrick Murphy

@sirbayes

2 months

Maybe thinking machines should be renamed LoRaS - Lora as a service? ;) this seems to be their business model, at least for now. Since training is cheap, a whole ecosystem of specialized models will bloom, and they can charge for serving them all in parallel. Smart.

5

6

117

Kevin Patrick Murphy

@sirbayes

2 months

I am pleased to share our new NeurIPS paper for online Bayesian inference for neural networks. Instead of focusing on updating the parameter posterior, we work with the predictive posterior (which makes much more sense for non-identifiable models, and gives us more algorithmic

Gerardo Duran-Martin

@grrddm

2 months

Our paper “Martingale Posterior Neural Networks for Fast Sequential Decision Making” has been accepted at #neurips2025! Joint work with @l_sbetancourt, @AlvaroCartea and @sirbayes Blog: https://t.co/yVzIkXvnNZ Paper: https://t.co/mSYkxygIKb Code: https://t.co/N3fwdykxy1

8

42

533

Kevin Patrick Murphy

@sirbayes

2 months

Hi @karpathy . I loved your interview! However, you said there is no work on LLM self-play. Not true. See eg "Spiral" from @natashajaques et al (agent-v-agent) and "Absolute Zero Reasoner" from @_AndrewZhao et al. (agent-v-env). Probably others.

Andrej Karpathy

@karpathy

2 months

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my

15

14

449

Kevin Patrick Murphy

@sirbayes

2 months

Video summary of our paper from @AIResearchRoundup https://t.co/ilESpUaytQ

0

1

19

Kevin Patrick Murphy

@sirbayes

2 months

Good article on AI / data center boom / bubble. The costs will outweigh returns for many years, until we make models much more efficient to train and serve (thinking tokens, I’m looking at you), and we reduce energy costs, and the tech becomes reliable enough to be integrated

noahpinion.blog

If the economy's single pillar goes down, Trump's presidency will be seen as a disaster.

9

17

215

Kevin Patrick Murphy

@sirbayes

2 months

beautiful work on (self) distilling diffusion / flow models!

Michael Albergo

@msalbergo

2 months

We've cleaned up the story big time on flow maps. Check out @nmboffi's slick repo implementing all the many ways to go about them, and stay tuned for a bigger release 🤠 https://t.co/7WygKSpbZP https://t.co/Juucy5l844

2

3

56

Kevin Patrick Murphy

@sirbayes

2 months

For more details (including how to train an RNN-based policy using PPO inside the "imagination" of the learned CWM), please see our paper: https://t.co/L9N7CeAB9s Joint work with @WLehrach, Daniel Hennes, @lazarox8, @heroesneverdie, Carter Wendelken, Zun Li, @antoine_dedieu,

0

8

43

Kevin Patrick Murphy

@sirbayes

2 months

Below we evaluate our method (in the perfect information setting) when playing against 3 different kinds of opponents: Gemini 2.5 Pro as a policy (using the same data as our method), MCTS with the ground truth world model (an upper bound), and a random policy (a lower bound). We

1

3

7

Kevin Patrick Murphy

@sirbayes

2 months

We apply our method to 10 different two-player games, 5 of which have perfect information (full observability of the world state), and 5 of which have imperfect information (partially observed state). 6 of the games are well-known (e.g., Tic-tac-toe, Backgammon, Connect Four;

1

2

7

Kevin Patrick Murphy

@sirbayes

2 months

In addition to learning the world model M=(T,O) and inference function I, we can also learn a value function V, which can be used to estimate reward-to-go at the leaf nodes explored by the MCTS planner. In the case of imperfect information games, we use Information Set MCTS,

1

2

8

Kevin Patrick Murphy

@sirbayes

2 months

For imperfect-information games, the LLM must also synthesize an inference function I that maps the observation history for each player i, o(1:t, i), to a plausible sequence of actions taken by all the players, a(1:t,0:N), including the chance player (number 0). Since the

1

2

8

Kevin Patrick Murphy

@sirbayes

2 months

We use a standard code synthesis method based on LLM-powered code refinement combined with Thompson sampling over a tree of partial programs. For perfect-information games, the LLM must synthesize a deterministic state transition function T, and a deterministic observation

1

3

8

Kevin Patrick Murphy

@sirbayes

2 months

We consider two scenarios for learning the world model: open-deck, where the offline training trajectories consist of (state, observation, action) triples generated from an initial random exploratory policy, and closed-deck, where the training trajectories do not include the

1

2

8

Kevin Patrick Murphy

@sirbayes

2 months

The world model can be interpreted as a Partially Observed Stochastic Game, which is represented below as a causal graphical model. All transitions are deterministic, but stochasticity can be introduced via hidden random actions chosen by the chance player. Each player sees its

1

3

14

Kevin Patrick Murphy

@sirbayes

2 months

We apply our method to various two-player games (with both perfect and imperfect information), and show that it works much better than prompting the LLM to directly generate actions, especially for novel games. In particular, we beat Gemini 2.5 Pro in 7 out of 10 games, and tie

1

2

17

Kevin Patrick Murphy

@sirbayes

2 months

I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments. The key idea is to use LLM-powered code synthesis to learn a code world model (in the form

17

103

825

Kevin Patrick Murphy

@sirbayes

2 months

Bob Carpenter, who is the main dev for @mcmc_stan, finally gets "Jax-pilled" :) https://t.co/xrDpvUyQNO With cheap (even free) GPUs on Google Colab and Lightning AI studio, it is now easy to get 10x-100x speedups for HMC, and more if you switch to SVI.

7

17

144