Tim Xiao
@TimZXiao
Followers
253
Following
1K
Media
36
Statuses
307
PhD student in Machine Learning @ University of Tübingen · IMPRS-IS scholar
Joined June 2012
✨ New paper: Flipping Against All Odds We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙 🧵 Here’s what we learned—and how we fixed it. 🔗 https://t.co/Auw7agOws3 1/
4
8
16
🤯 Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging. 🚀 Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space
10
96
613
The physics prior matters in molecular structures. We model potential energy between molecules for drug design. This happens to have a coincident yet interesting connection to my past work, hyperspherical energy ( https://t.co/aRJSgn3gaE), which considers potential energy between
0
2
17
@kenneth0stanley @ai_bread This prompt baking reminds me of verbalized machine learning. Though they don't modify the weights, but update the parameters. https://t.co/xqHDuWkY7f
Verbalized Machine Learning (VML) moves machine learning into natural language space, where one learns a model parameterized by natural language using LLMs. How does VML connect LLMs with: Universal function approximator? von Neumann architecture? Interpretable learning? How
0
1
2
Polymer simulations, but make them Vivace ⚡ It was a pleasure to work on Vivace architecture during my time in @MSFTResearch together with Lixin Sun and @gncsimm .
MLFFs 🤝 Polymers — SimPoly works! Our team at @MSFTResearch AI for Science is proud to present SimPoly (SIM-puh-lee) — a deep learning solution for polymer simulation. Polymeric materials are foundational to modern life—found in everything from the clothes we wear and the food
0
1
11
This is almost a year-long project and led by @ItsTheZhen. My biggest takeaway is that physical simulation is very effective as a reward signal, and this efficient verification is crucial for unlocking LLMs’ design novelty. This conclusion is actually aligned with our previous
Can LLMs design real machines — from 🚗 cars to 🏹 catapults? Can they engineer through both 🧠 agentic workflows and 🌀 reinforcement learning (RL) — learning from physical simulation instead of text alone? We treat machine design as “machine code writing”, where LLMs assemble
0
5
33
Sharing a fascinating work: BesiegeField. It explores how LLMs can think and design directly in the space of natural language — a meaningful and fitting challenge for LLMs. A great example of verbalized computing, where design goals are defined in words rather than formal specs.
Can LLMs design real machines — from 🚗 cars to 🏹 catapults? Can they engineer through both 🧠 agentic workflows and 🌀 reinforcement learning (RL) — learning from physical simulation instead of text alone? We treat machine design as “machine code writing”, where LLMs assemble
0
0
4
🤖 Can LLMs learn to create? Introducing "Agentic Design of Compositional Machines" — a new frontier where AI builds functional machines from standardized parts. We present BesiegeField, a simulation testbed to benchmark LLMs on tasks like building cars & catapults. Key
1
3
14
TL;DR: Meet BesiegeField—a playground where LLMs build, test, and refine machines from standard parts in real time. We tested agentic workflows and RLVR with top LLMs: even the strongest still show limits in compositional machine design. 🔗 https://t.co/V4GQx2KB8q 🧵 below
Human history is marked by the machines we created: from the Antikythera mechanism of ancient Greece, to the imaginations of the Renaissance, to the engines of the steam era. We wonder: can LLMs, like humans, build sophisticated machines to achieve purposeful functionality?
0
3
7
Human history is marked by the machines we created: from the Antikythera mechanism of ancient Greece, to the imaginations of the Renaissance, to the engines of the steam era. We wonder: can LLMs, like humans, build sophisticated machines to achieve purposeful functionality?
2
3
12
This is a wonderful collaboration with @ItsTheZhen and Wenqian. I’ve long been curious whether large language models truly possess creativity -- the ability to build something genuinely novel. This project represents our first step toward answering that question. It also aligns
Human history is marked by the machines we created: from the Antikythera mechanism of ancient Greece, to the imaginations of the Renaissance, to the engines of the steam era. We wonder: can LLMs, like humans, build sophisticated machines to achieve purposeful functionality?
0
2
9
The #EurIPS Salon des Refusés poster session is now accepting submissions 📜 We welcome self-nomination of rejected submissions from all @NeurIPSConf tracks (Main, D&B, Position). The call for self-nominations will stay open until the session is filled. https://t.co/w8cu8EAaGs
2
5
8
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
665
3K
24K
I enjoy reading this blog. This is exactly what I am trying to pursue throughout my research career -- using weight geometry to characterize and improve neural network training. Really excited that it finally got people's attention now!! In 2017, we study the weight
arxiv.org
Convolution as inner product has been the founding basis of convolutional neural networks (CNNs) and the key to end-to-end visual representation learning. Benefiting from deeper architectures,...
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
0
8
45
VUD’s gonna be at #NeurIPS2025 🎉🥳 Special thanks to my labmates that made this collaboration especially enjoyable!
Wanna understand the sources of uncertainty in LLMs when performing in-context learning 🤔? 🚀 We introduce a variational uncertainty decomposition framework for in-context learning without explicitly sampling from the latent parameter posterior. 📄 Paper:
0
5
32
🥳POET is accepted to #NeurIPS2025!
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)! POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. https://t.co/g5TzYGOhSE 1/6
0
3
20
The amazing blogpost from @gordic_aleksa is alive at vLLM's blogpost https://t.co/dI8NSyd3t3 (after more proofreading and clarifications)! Looking forward to future series of tech deep dive blogposts😍
blog.vllm.ai
[!NOTE] Originally posted on Aleksa Gordic’s website.
New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up
11
80
635
We have been working on enabling LLMs to generate symbolic graphics programs since IG-LLM ( https://t.co/LjD8TdkpOz) and SGP-Bench ( https://t.co/O8deijiCen), but SFT didn't really work at the time. We now find that, with a properly designed cross-modal reward (eg, CLIP), RLVR can
ig-llm.is.tue.mpg.de
0
6
16
We show how to make LLM in-context learning approximately Bayesian & decompose uncertainty IMO this is proper approximate inference 🥰 applied to LLMs Led by awesome students @shavindra_j @jacobyhsi88 Filippo & Wenlong 👍 Example👇by prompting, bandits & NLP examples in paper
7
19
165
For tactile sensing, why shear forces are less accurate to detect than normal forces: Isoline-based theory for tactile sensing explains shear force detection at superresolution. (Open Access Link: https://t.co/BA9vNgFj6n)
0
1
5