Sachin Gururangan
@ssgrn
Followers
7K
Following
2K
Media
71
Statuses
971
Researcher @AnthropicAI Prev:🦙 @aiatmeta, @allen_ai PhD @uwcse + @uwnlp
SF x LA
Joined November 2011
Life update: I’m thrilled to be joining the pretraining team at @AnthropicAI next week! Grateful to everyone at @Meta GenAI for an incredible journey building Llama. Excited for the next chapter 🚀
39
8
971
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
902
2K
18K
We’ve formed a partnership with NVIDIA and Microsoft. Claude is now on Azure—making ours the only frontier models available on all three major cloud services. NVIDIA and Microsoft will invest up to $10bn and $5bn respectively in Anthropic. https://t.co/3RA82NEIJ3
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
184
337
4K
We’re opening offices in Paris and Munich. EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than ninefold in the past year. We’ll be hiring local teams to support this expansion. Read more here:
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
122
104
2K
Today, we announced that we plan to expand our use of Google TPUs, securing approximately one million TPUs and more than a gigawatt of capacity in 2026.
235
458
6K
Introducing Claude Haiku 4.5: our latest small model. Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.
323
1K
7K
Today we're releasing new @AnthropicAI research on how educators use AI, analyzing ~74,000 conversations from professors using @claudeai in collaboration with Northeastern University. 4 initial findings… #1 Educators are builders, not just users of AI. Faculty are creating
17
72
550
We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us!
job-boards.greenhouse.io
191
207
2K
Excited to share that @AnthropicAI has launched its Economic Futures Program! As a member of their Economic Advisory Council, I’m thrilled about this initiative supporting research and policy development on AI’s economic impacts. Research grants up to $50K available!
2
69
399
📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the
job-boards.greenhouse.io
Zürich, CH
12
37
654
I will be attending ICML next week. Reach out (by email) if you'd like to chat! About Anthropic / research / life. I'm especially interested in meeting grad students who can teach me new research ideas.
8
9
282
Our team is very excited to release Llama 4! Open reasoning model drops are incoming too 🙂
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
1
5
70
Check out the newest member of the "Branch-Train" family -- BTS (or, you know, your favorite k-pop boy band)! We introduce "stitch layers", a new architecture to combine expert LLMs with a small amount of training. Amazing work led by our intern @IreneZhang30 !!
✨New Preprint✨We introduce 𝐁𝐫𝐚𝐧𝐜𝐡-𝐓𝐫𝐚𝐢𝐧-𝐒𝐭𝐢𝐭𝐜𝐡 (𝐁𝐓𝐒), an efficient & flexible method for stitching together independently pretrained LLM experts (i.e. code, math) into a single, capable generalist model. Key Takeaways: ✅BTS achieves the best average
0
2
15
Our team is excited to release Llama 3.3 70B which is comparable in performance to 405B/GPT4o! Post-training go brrrr
Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at
2
2
31
New paper by our intern @yue__yu! We use synthetic data to teach reward models to generate rationales for their scalar outputs. Our technique makes RMs less of a black box, more powerful, and more data efficient. Check it out!
🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! https://t.co/Mcv3NvS4lf
#rlhf #LLM
1
4
30
2025 internship opps on the Llama team are now live! Feel free to reach out, especially if you’re excited about working on problems in post-training world (eg ranking/judges, reasoning, or all things synthetic data)! Lots of fun things to explore :) https://t.co/sayR82WMVQ
5
30
315
The Llama 3 paper is a must-read for anyone in AI and CS. It’s an absolutely accurate and authoritative take on what it takes to build a leading LLM, the tech behind ChatGPT, Gemini, Copilot, and others. The AI part might seem small in comparison to the gargantuan work on *data*
Why do 16k GPU jobs fail? The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc. We hit an overall 90% effective-training-time. https://t.co/hsSIW4bayK
12
289
2K
as my other amazing teammates have already shared, check out our llama 3.1 paper here! lots of fun tidbits about the highs, lows, sweat, and tears that go into training LLMs lol ... onto llama 4!!!
1
9
80
Oh, one more thing! Our new Llama license allows the outputs of the Llama 3.1 models to improve any other model. So, go nuts :)
Llama 3.1 405B is here! It has 128K context, and is a really strong model (MMLU 5-shot 87.3, HumanEval 89.0, MATH 73.8) Model: https://t.co/XYN12ngt4h Paper: https://t.co/zl6ifl9ARm As a member of posttraining team, here are a few takeaways from posttraining Llama 3 🧵
1
0
39