Samuel Müller @SamuelMullr X Profile

Samuel Müller

@SamuelMullr

Followers

1K

Following

2K

Media

43

Statuses

450

Working on (Tab)PFNs at Meta. Ex-DeepL, Ex-Amazon. ETH BSc, Cambridge MPhil, PhD from @FrankRHutter's lab. Opinions are my own. (he/him)

Berlin

Joined February 2020

Don't wanna be here? Send us removal request.

Samuel Müller

@SamuelMullr

8 months

This might be the first time after 10 years that boosted trees are not the best default choice when working with data in tables. Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉

31

109

952

Samuel Müller

@SamuelMullr

2 months

I‘m at ICML this week. Happy to meet up and talk about anything tabular data, in-context learning and general deep learning :).

0

13

Grok

@grok

3 days

Join millions who have switched to Grok.

163

309

2K

Samuel Müller

@SamuelMullr

2 months

Check out our position paper and come to our ICML poster (Thursday 4:30 PM, East Exhibition Hall A-B E-606). n/n.

arxiv.org

Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs)...

0

1

Samuel Müller

@SamuelMullr

2 months

There are already early examples of this, that we discuss, in areas as diverse as biology, Bayesian optimization, time-series forecasting, and tabular data. The most prominent being TabPFN (Nature '25). 5/n.

Samuel Müller

@SamuelMullr

8 months

This might be the first time after 10 years that boosted trees are not the best default choice when working with data in tables. Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉

1

0

Samuel Müller

@SamuelMullr

2 months

We go into detailed comparisons to other Bayesian methods and the trade-offs that lead us to the conclusion, that PFNs will become dominant for Bayesian prediction, and further that Bayesian prediction will become more important overall with better priors. 4/n.

1

0

Samuel Müller

@SamuelMullr

2 months

What's nice is that the model after training on this random data, will start to make sense of real-world data, too. It will approximate the posterior belonging to the prior of choice, e.g., a BNN, a GP, or in the most interesting cases a Bayesian model that doesn't exist yet. 3/n.

1

0

Samuel Müller

@SamuelMullr

2 months

Prior-data fitted networks (PFNs) do just that!. The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n

1

0

Samuel Müller

@SamuelMullr

2 months

Compute is increasing much faster than data. How can we improve classical supervised learning long term (the underlying tech of most of GenAI)?. Our ICML position paper's answer: simply train on a bunch of artificial data (noise) and only do inference on real-world data! 1/n.

1

2

16

Samuel Müller

@SamuelMullr

2 months

Sounds like a pretty cool type of benchmark.

Harsha Nori

@HarshaNori

2 months

As some of you may know, I recently moved to London to help lead a new Health AI team!. Excited for our first research paper, which demonstrates that AI can tackle medicine’s toughest diagnostic challenges -- at 4x higher accuracy and 20% lower costs than a group of physicians🧵

0

Samuel Müller

@SamuelMullr

3 months

RT @kchonyc: finally, wind is changing its direction: causal inference becomes easier if we give up on designing a new estimation algorithm….

0

21

0

Samuel Müller

@SamuelMullr

3 months

RT @JakeMRobertson: We present a new approach to causal inference. Pre-trained on synthetic data, Do-PFN opens the door to a new domain: PF….

0

3

0

Samuel Müller

@SamuelMullr

3 months

RT @bschoelkopf: In 2015, we ran a workshop on "Drawing causal inference from Big Data" at the NAS. Back then, “Big Data” felt like a buzzw….

0

5

0

Samuel Müller

@SamuelMullr

4 months

Deadline coming up! Consider double submitting your Neurips submissions to our workshop for high quality reviews and discussions at the workshop.

Nick Erickson @ ICML

@innixma

4 months

🚨 Reminder: Paper submissions for the 1st Workshop on Foundation Models for Structured Data (#FMSD) at #ICML2025 are due May 19!. Working on tabular/ts foundation models (TabPFN/Chronos/etc.)? This is the workshop for you!. 📅 Deadline: May 19.🔗 CFP:

0

1

5

Samuel Müller

@SamuelMullr

4 months

I am so proud to co-organize the workshop on foundation models for structured data at ICML. At this workshop, we will discuss on how to further extend the GenAI revolution to tabular data, time series forecasting etc. If you work on this consider submitting your work by May 19!.

0

3

21

Samuel Müller

@SamuelMullr

4 months

RT @egrefen: I wrote this in part jokingly a few months ago but have now met US profs who've had research cancelled because of this very is….

0

5

0

Samuel Müller

@SamuelMullr

4 months

RT @DeepLearningAI: Researchers introduced TabPFN, a transformer model trained on 100 million synthetic datasets to predict unclassified or….

0

44

0

Samuel Müller

@SamuelMullr

4 months

RT @paulg: It's a very exciting time in tech right now. If you're a first-rate programmer, there are a huge number of other places you can….

0

2K

0

Samuel Müller

@SamuelMullr

6 months

Find my full write up (including scenarios with bad actors, as well as the prompts used) plus the game here: If you think, my single person experiment is not to be trusted? You are right, try it yourself!.

github.com

Contribute to SamuelGabriel/LMARENA-GAMING development by creating an account on GitHub.

0

1

Samuel Müller

@SamuelMullr

6 months

In combination with the large employee numbers at top AI labs and small numbers of votes on lmarena lead me to the conclusion that lmarena scores are probably dominated by biased votes.

1

0

Samuel Müller

@SamuelMullr

6 months

In hard mode I attributed 13/20 completely correctly, much higher than the expected 3.3 of random guessing. That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions. That means attributing responses to LLMs is super easy for humans.

1

0

Samuel Müller

@SamuelMullr

6 months

I first played easy mode (see below), where I got two answers from each model and need to match them. I used 20 interactions in the easy mode to learn the models' behaviors. In hard mode (see prev post), you need to match three responses to the LLM name.

1

0