stephenx_ Profile Banner
Stephen Xie Profile
Stephen Xie

@stephenx_

Followers
168
Following
294
Media
10
Statuses
59

@berkeleynlp @berkeley_ai | eecs & business @ucberkeleymet

Berkeley & SF
Joined August 2020
Don't wanna be here? Send us removal request.
@stephenx_
Stephen Xie
3 days
interesting things I noticed: 1. model can optimize to produce gibberish (!) 2. GEPA can produce long chains of domain-specific few-shot examples 3. diverse data are needed to generalize ood 4. reward shaping incentivizes faster convergence at the cost of early plateau (3/3) 🧵
0
3
10
@stephenx_
Stephen Xie
3 days
something cool I found while experimenting w/ GEPA: when you let a model optimize a system prompt to bypass AI detectors, it learns to explicitly avoid common patterns. (2/3) 🧵
1
2
8
@stephenx_
Stephen Xie
3 days
I got models to stop writing "[X] isn't just about [Y], it's about [Z]" (and em dashes) (1/3) 🧵
1
7
16
@a1zhang
Alex L Zhang
11 days
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
122
347
3K
@cvenhoff00
Constantin Venhoff
16 days
🚨 What do reasoning models actually learn during training? Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them! By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵
16
72
583
@jm_alexia
Alexia Jolicoeur-Martineau
19 days
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: https://t.co/w5ZDsHDDPE Code: https://t.co/7UgKuD9Yll Paper:
Tweet card summary image
arxiv.org
Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on...
136
644
4K
@karpathy
Andrej Karpathy
25 days
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea
@dwarkesh_sp
Dwarkesh Patel
1 month
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training
428
1K
10K
@thinkymachines
Thinking Machines
25 days
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
225
780
6K
@dwarkesh_sp
Dwarkesh Patel
1 month
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training
258
636
5K
@stephenx_
Stephen Xie
1 month
@interaction @beeper also connects to iMessage!
0
0
1
@stephenx_
Stephen Xie
1 month
0
0
1
@stephenx_
Stephen Xie
1 month
Good work @interaction
2
0
4
@thinkymachines
Thinking Machines
2 months
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
@arithmoquine
henry
3 months
new post. there's a lot in it. i suggest you check it out
71
184
3K
@stephenx_
Stephen Xie
3 months
Yup :) here are the reasoning traces from running gpt-oss 20b on the s1k dataset https://t.co/qJpYvh4ado
Tweet card summary image
huggingface.co
@oliveraochongli
Oliver Li
3 months
probably still early to ask, but has anyone distilled reasoning traces from gpt-oss?
1
0
1
@stephenx_
Stephen Xie
3 months
😭😭😭
0
0
0
@stephenx_
Stephen Xie
3 months
Did GPT-5 make this graph💀💀💀 @sama
2
1
7
@stephenx_
Stephen Xie
3 months
0
0
4
@OwainEvans_UK
Owain Evans
3 months
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
291
1K
8K
@Kimi_Moonshot
Kimi.ai
4 months
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence
279
1K
7K