
Alex Wettig
@_awettig
Followers
1K
Following
2K
Media
21
Statuses
180
PhD@princeton trying to make sense of language models and their training data
Joined July 2022
RT @AnthropicAI: Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly deci….
0
206
0
New paper cutting through the thicket of KV cache eviction methods!.
There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7
0
0
15
RT @amanrsanger: Claude Sonnet 4 is much better at codebase understanding. Paired with recent improvements in Cursor, it's SOTA on large c….
0
44
0
RT @OfirPress: Great results from the Claude team- the 80% result is pass@1!! They ran the model in parallel multiple times and had an LM j….
0
6
0
Big arrow time!. We can make huge progress on open-source SWE agents by scaling up the creation of virtual coding environments 🚀.
40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
0
3
16
RT @cindy_x_wu: Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimo….
0
44
0
RT @jyangballin: @ weekend warriors - DM me a GitHub repo that you like / maintain, and I'll train you a 7B coding agent that's an expert f….
0
8
0
RT @jacspringer: Training with more data = better LLMs, right? 🚨. False! Scaling language models by adding more pre-training data can decre….
0
174
0
RT @alisawuffles: We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale….
0
320
0
RT @logan_engstrom: Want state-of-the-art data curation, data poisoning & more? Just do gradient descent!. w/ @andrew_ilyas Ben Chen @axel_….
0
31
0
RT @ZhiyuanZeng_: Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to….
0
87
0
RT @Thom_Wolf: I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won….
0
514
0
RT @orionweller: Ever wonder how test-time compute would do in retrieval? 🤔. introducing ✨rank1✨. rank1 is distilled from R1 & designed for….
0
37
0
I really rate Anthropic's laser focus on things that matter. unironically even playing pokemon is more important than frontier math evals for robust general intelligence (open-endedness, exploration, tool use, . ).
Claude 3.7 Sonnet is a state-of-the-art model for both coding and agentic tool use. In developing it, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect the needs of our users.
0
0
12