Stephen Xie
@stephenx_
Followers
168
Following
294
Media
10
Statuses
59
@berkeleynlp @berkeley_ai | eecs & business @ucberkeleymet
Berkeley & SF
Joined August 2020
interesting things I noticed: 1. model can optimize to produce gibberish (!) 2. GEPA can produce long chains of domain-specific few-shot examples 3. diverse data are needed to generalize ood 4. reward shaping incentivizes faster convergence at the cost of early plateau (3/3) 🧵
0
3
10
something cool I found while experimenting w/ GEPA: when you let a model optimize a system prompt to bypass AI detectors, it learns to explicitly avoid common patterns. (2/3) 🧵
1
2
8
I got models to stop writing "[X] isn't just about [Y], it's about [Z]" (and em dashes) (1/3) 🧵
1
7
16
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
122
347
3K
🚨 What do reasoning models actually learn during training? Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them! By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵
16
72
583
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: https://t.co/w5ZDsHDDPE Code: https://t.co/7UgKuD9Yll Paper:
arxiv.org
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on...
136
644
4K
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training
428
1K
10K
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
225
780
6K
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training
258
636
5K
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
Yup :) here are the reasoning traces from running gpt-oss 20b on the s1k dataset https://t.co/qJpYvh4ado
huggingface.co
1
0
1
Did GPT-5 make this graph💀💀💀 @sama
2
1
7
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
291
1K
8K
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence
279
1K
7K