Kevin Patrick Murphy Profile
Kevin Patrick Murphy

@sirbayes

Followers
65K
Following
4K
Media
103
Statuses
1K

Research Scientist at Google DeepMind. Interested in Bayesian Machine Learning.

Joined October 2016
Don't wanna be here? Send us removal request.
@sirbayes
Kevin Patrick Murphy
11 days
I am pleased to announce another update to my RL tutorial ( https://t.co/SjMdabksJo). This time I have added code for RLFT for multi-turn LLM agents, using the awesome Tinker library from @thinkymachines, and the simple ReBN training loop from GEM by @zzlccc et al. With ~100
15
151
1K
@sirbayes
Kevin Patrick Murphy
12 days
This is a cool Julia version of my Jax library for Bayesian Structural Time Series modeling ( https://t.co/T5gQOQPN9o) from the folks at @ReactiveBayes. It can easily handle non-linear and non-conjugate likelihoods (eg Poisson distribution for integer count observations). For
@ReactiveBayes
ReactiveBayes
19 days
New Bayesian Structured Time Series example: predicting taco demand during #NeurIPS 2025 ⚡️ Learnable Dynamics 🔢 Non-Conjugacies 🏎️ Blazing Speed https://t.co/t23bDp7Lab #JuliaLang #BayesianInference #DataScience
1
21
217
@sirbayes
Kevin Patrick Murphy
27 days
This is a much more intuitive version of the prisoners dilemma from Jon Kleinberg’s excellent book, https://t.co/EsoYlFhGws, that covers the holy trifecta that Michael Jordan keeps discussing, namely Econ + CS + stats. (Very relevant for a future with human and AI agents …)
3
46
349
@sirbayes
Kevin Patrick Murphy
2 months
Maybe thinking machines should be renamed LoRaS - Lora as a service? ;) this seems to be their business model, at least for now. Since training is cheap, a whole ecosystem of specialized models will bloom, and they can charge for serving them all in parallel. Smart.
5
6
117
@sirbayes
Kevin Patrick Murphy
2 months
I am pleased to share our new NeurIPS paper for online Bayesian inference for neural networks. Instead of focusing on updating the parameter posterior, we work with the predictive posterior (which makes much more sense for non-identifiable models, and gives us more algorithmic
@grrddm
Gerardo Duran-Martin
2 months
Our paper “Martingale Posterior Neural Networks for Fast Sequential Decision Making” has been accepted at #neurips2025! Joint work with @l_sbetancourt, @AlvaroCartea and @sirbayes Blog: https://t.co/yVzIkXvnNZ Paper: https://t.co/mSYkxygIKb Code: https://t.co/N3fwdykxy1
8
42
533
@sirbayes
Kevin Patrick Murphy
2 months
Hi @karpathy . I loved your interview! However, you said there is no work on LLM self-play. Not true. See eg "Spiral" from @natashajaques et al (agent-v-agent) and "Absolute Zero Reasoner" from @_AndrewZhao et al. (agent-v-env). Probably others.
@karpathy
Andrej Karpathy
2 months
My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my
15
14
449
@sirbayes
Kevin Patrick Murphy
2 months
Video summary of our paper from @AIResearchRoundup https://t.co/ilESpUaytQ
0
1
19
@sirbayes
Kevin Patrick Murphy
2 months
Good article on AI / data center boom / bubble. The costs will outweigh returns for many years, until we make models much more efficient to train and serve (thinking tokens, I’m looking at you), and we reduce energy costs, and the tech becomes reliable enough to be integrated
Tweet card summary image
noahpinion.blog
If the economy's single pillar goes down, Trump's presidency will be seen as a disaster.
9
17
215
@sirbayes
Kevin Patrick Murphy
2 months
beautiful work on (self) distilling diffusion / flow models!
@msalbergo
Michael Albergo
2 months
We've cleaned up the story big time on flow maps. Check out @nmboffi's slick repo implementing all the many ways to go about them, and stay tuned for a bigger release 🤠 https://t.co/7WygKSpbZP https://t.co/Juucy5l844
2
3
56
@sirbayes
Kevin Patrick Murphy
2 months
For more details (including how to train an RNN-based policy using PPO inside the "imagination" of the learned CWM), please see our paper: https://t.co/L9N7CeAB9s Joint work with @WLehrach, Daniel Hennes, @lazarox8, @heroesneverdie, Carter Wendelken, Zun Li, @antoine_dedieu,
0
8
43
@sirbayes
Kevin Patrick Murphy
2 months
Below we evaluate our method (in the perfect information setting) when playing against 3 different kinds of opponents: Gemini 2.5 Pro as a policy (using the same data as our method), MCTS with the ground truth world model (an upper bound), and a random policy (a lower bound). We
1
3
7
@sirbayes
Kevin Patrick Murphy
2 months
We apply our method to 10 different two-player games, 5 of which have perfect information (full observability of the world state), and 5 of which have imperfect information (partially observed state). 6 of the games are well-known (e.g., Tic-tac-toe, Backgammon, Connect Four;
1
2
7
@sirbayes
Kevin Patrick Murphy
2 months
In addition to learning the world model M=(T,O) and inference function I, we can also learn a value function V, which can be used to estimate reward-to-go at the leaf nodes explored by the MCTS planner. In the case of imperfect information games, we use Information Set MCTS,
1
2
8
@sirbayes
Kevin Patrick Murphy
2 months
For imperfect-information games, the LLM must also synthesize an inference function I that maps the observation history for each player i, o(1:t, i), to a plausible sequence of actions taken by all the players, a(1:t,0:N), including the chance player (number 0). Since the
1
2
8
@sirbayes
Kevin Patrick Murphy
2 months
We use a standard code synthesis method based on LLM-powered code refinement combined with Thompson sampling over a tree of partial programs. For perfect-information games, the LLM must synthesize a deterministic state transition function T, and a deterministic observation
1
3
8
@sirbayes
Kevin Patrick Murphy
2 months
We consider two scenarios for learning the world model: open-deck, where the offline training trajectories consist of (state, observation, action) triples generated from an initial random exploratory policy, and closed-deck, where the training trajectories do not include the
1
2
8
@sirbayes
Kevin Patrick Murphy
2 months
The world model can be interpreted as a Partially Observed Stochastic Game, which is represented below as a causal graphical model. All transitions are deterministic, but stochasticity can be introduced via hidden random actions chosen by the chance player. Each player sees its
1
3
14
@sirbayes
Kevin Patrick Murphy
2 months
We apply our method to various two-player games (with both perfect and imperfect information), and show that it works much better than prompting the LLM to directly generate actions, especially for novel games. In particular, we beat Gemini 2.5 Pro in 7 out of 10 games, and tie
1
2
17
@sirbayes
Kevin Patrick Murphy
2 months
I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments. The key idea is to use LLM-powered code synthesis to learn a code world model (in the form
17
103
825
@sirbayes
Kevin Patrick Murphy
2 months
Bob Carpenter, who is the main dev for @mcmc_stan, finally gets "Jax-pilled" :) https://t.co/xrDpvUyQNO With cheap (even free) GPUs on Google Colab and Lightning AI studio, it is now easy to get 10x-100x speedups for HMC, and more if you switch to SVI.
7
17
144