Ludwig Schmidt Profile
Ludwig Schmidt

@lschmidt3

Followers
6K
Following
2K
Media
3
Statuses
233

Assistant professor at @Stanford and member of the technical staff at @AnthropicAI.

Palo Alto, CA
Joined August 2009
Don't wanna be here? Send us removal request.
@anas_awadalla
Anas Awadalla
8 hours
We're releasing🍨Gelato-30B-A3B, a state-of-the-art computer grounding model that delivers immediate performance gains for computer-use agents! Trained on our open-source🖱️Click-100k dataset, Gelato achieves 63.8% on ScreenSpot-Pro and 69.1% on OS-World-G. It outperforms
5
19
75
@alexgshaw
Alex Shaw
3 days
Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
21
64
319
@jyangballin
John Yang
5 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
89
364
@alexgshaw
Alex Shaw
4 months
Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now
1
24
104
@lschmidt3
Ludwig Schmidt
5 months
I'm a big fan of the approach to research funding @andykonwinski and the Laude team are taking! Working with them on terminal-bench has been fantastic (thanks @alexgshaw!) and I'm excited that they're going to support more open, impact-oriented research.
@andykonwinski
Andy Konwinski
5 months
Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.
2
5
91
@thao_nguyen26
Thao Nguyen
5 months
Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! https://t.co/eBX0P4Sj7a
14
65
226
@lschmidt3
Ludwig Schmidt
5 months
More details on https://t.co/r7CnIpGTNl, Ryan’s thread below, and the paper itself https://t.co/6BjPBCpXbv https://t.co/lnLL9mUZak
Tweet card summary image
openthoughts.ai
Pushing the boundaries of open reasoning datasets through rigorous experimentation.
@ryanmart3n
Ryan Marten
5 months
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
0
3
31
@lschmidt3
Ludwig Schmidt
5 months
Together with the paper we also release our new dataset OpenThoughts3-1.2M and the corresponding model OpenThinker3-7B, which is currently the best open-data 7B reasoning model.
1
0
25
@lschmidt3
Ludwig Schmidt
5 months
Similar to previous DataComp projects, we systematically experiment with every step of the data generation pipeline to build a state-of-the-art training set. Overall we conducted more than 1,000 individual experiments.
1
0
32
@lschmidt3
Ludwig Schmidt
5 months
Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
22
207
1K
@ryanmart3n
Ryan Marten
5 months
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
33
195
928
@lschmidt3
Ludwig Schmidt
5 months
Cool to see more work on data for AI agents!
@ajratner
Alex Ratner
6 months
Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---
0
0
13
@percyliang
Percy Liang
6 months
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
55
218
1K
@lschmidt3
Ludwig Schmidt
6 months
Very excited about our new agent benchmark! I think it's a nice way of evaluating how well agents can do complex task in terminal (command line) environments.
@Mike_A_Merrill
Mike A. Merrill
6 months
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
2
5
79
@Mike_A_Merrill
Mike A. Merrill
6 months
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
16
65
243
@jyangballin
John Yang
6 months
40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
24
142
657
@thao_nguyen26
Thao Nguyen
6 months
📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! 📅 Deadline: May 24, AoE 🔗 Website: https://t.co/K3U540rqoe We have an amazing lineup of speakers + panelists from various institutions and application areas.
2
26
135
@StanfordAILab
Stanford AI Lab
7 months
SAIL is still accepting applications for the SAIL Postdoctoral Fellowships! This is an opportunity to work with our wonderful professors and community. Applications received by the end of April 30 will receive full consideration:
1
21
13
@stanfordnlp
Stanford NLP Group
7 months
Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in @OpenAI’s non-technical reports? @percyliang and @tatsu_hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.
10
163
1K