_awettig Profile Banner
Alex Wettig Profile
Alex Wettig

@_awettig

Followers
1K
Following
2K
Media
21
Statuses
180

PhD@princeton trying to make sense of language models and their training data

Joined July 2022
Don't wanna be here? Send us removal request.
@_awettig
Alex Wettig
5 months
🤔 Ever wondered how prevalent some type of web content is during LM pre-training?. In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐. Key takeaway: domains help us curate better pre-training data! 🧵/N
Tweet media one
5
49
196
@_awettig
Alex Wettig
7 days
RT @AnthropicAI: Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink. After someone randomly deci….
0
206
0
@_awettig
Alex Wettig
11 days
New paper cutting through the thicket of KV cache eviction methods!.
@AdithyaNLP
Adithya Bhaskar
11 days
There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7
Tweet media one
0
0
15
@_awettig
Alex Wettig
1 month
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
72
0
@_awettig
Alex Wettig
1 month
RT @amanrsanger: Claude Sonnet 4 is much better at codebase understanding. Paired with recent improvements in Cursor, it's SOTA on large c….
0
44
0
@_awettig
Alex Wettig
1 month
RT @KLieret: Massive gains with Sonnet 4 on SWE-agent: Single-attempt pass@1 rises to 69% on SWE-bench Verified! Sonnet 4 iterates longer (….
0
12
0
@_awettig
Alex Wettig
1 month
RT @OfirPress: Great results from the Claude team- the 80% result is pass@1!! They ran the model in parallel multiple times and had an LM j….
0
6
0
@_awettig
Alex Wettig
2 months
Big arrow time!. We can make huge progress on open-source SWE agents by scaling up the creation of virtual coding environments 🚀.
@jyangballin
John Yang
2 months
40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
Tweet media one
0
3
16
@_awettig
Alex Wettig
2 months
RT @cursor_ai: Cursor is now free for students. Enjoy!.
0
4K
0
@_awettig
Alex Wettig
2 months
RT @cindy_x_wu: Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimo….
0
44
0
@_awettig
Alex Wettig
2 months
RT @jyangballin: @ weekend warriors - DM me a GitHub repo that you like / maintain, and I'll train you a 7B coding agent that's an expert f….
0
8
0
@_awettig
Alex Wettig
3 months
RT @jacspringer: Training with more data = better LLMs, right? 🚨. False! Scaling language models by adding more pre-training data can decre….
0
174
0
@_awettig
Alex Wettig
4 months
RT @alisawuffles: We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale….
0
320
0
@_awettig
Alex Wettig
4 months
RT @logan_engstrom: Want state-of-the-art data curation, data poisoning & more? Just do gradient descent!. w/ @andrew_ilyas Ben Chen @axel_….
0
31
0
@_awettig
Alex Wettig
4 months
RT @ZhiyuanZeng_: Is a single accuracy number all we can get from model evals?🤔.🚨Does NOT tell where the model fails.🚨Does NOT tell how to….
0
87
0
@_awettig
Alex Wettig
4 months
RT @jxbz: I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for m….
0
128
0
@_awettig
Alex Wettig
4 months
RT @Thom_Wolf: I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won….
0
514
0
@_awettig
Alex Wettig
4 months
RT @pfau: FWIW, my take was never "the scaling laws will break down" but "the scaling laws holding means you'd hit a point of diminishing r….
0
4
0
@_awettig
Alex Wettig
4 months
RT @orionweller: Ever wonder how test-time compute would do in retrieval? 🤔. introducing ✨rank1✨. rank1 is distilled from R1 & designed for….
0
37
0
@_awettig
Alex Wettig
4 months
RT @JanePan_: When benchmarks talk, do LLMs listen?. Our new paper shows that evaluating that code LLMs with interactive feedback significa….
0
13
0
@_awettig
Alex Wettig
4 months
I really rate Anthropic's laser focus on things that matter. unironically even playing pokemon is more important than frontier math evals for robust general intelligence (open-endedness, exploration, tool use, . ).
@AnthropicAI
Anthropic
4 months
Claude 3.7 Sonnet is a state-of-the-art model for both coding and agentic tool use. In developing it, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect the needs of our users.
Tweet media one
0
0
12