ssgrn Profile Banner
Sachin Gururangan Profile
Sachin Gururangan

@ssgrn

Followers
7K
Following
2K
Media
71
Statuses
971

Researcher @AnthropicAI Prev:🦙 @aiatmeta, @allen_ai PhD @uwcse + @uwnlp

SF x LA
Joined November 2011
Don't wanna be here? Send us removal request.
@ssgrn
Sachin Gururangan
6 months
Life update: I’m thrilled to be joining the pretraining team at @AnthropicAI next week! Grateful to everyone at @Meta GenAI for an incredible journey building Llama. Excited for the next chapter 🚀
39
8
971
@claudeai
Claude
2 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
902
2K
18K
@AnthropicAI
Anthropic
8 days
We’ve formed a partnership with NVIDIA and Microsoft. Claude is now on Azure—making ours the only frontier models available on all three major cloud services. NVIDIA and Microsoft will invest up to $10bn and $5bn respectively in Anthropic. https://t.co/3RA82NEIJ3
Tweet card summary image
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
184
337
4K
@AnthropicAI
Anthropic
19 days
We’re opening offices in Paris and Munich. EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than ninefold in the past year. We’ll be hiring local teams to support this expansion. Read more here:
Tweet card summary image
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
122
104
2K
@AnthropicAI
Anthropic
1 month
Today, we announced that we plan to expand our use of Google TPUs, securing approximately one million TPUs and more than a gigawatt of capacity in 2026.
235
458
6K
@claudeai
Claude
1 month
Introducing Claude Haiku 4.5: our latest small model. Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.
323
1K
7K
@drew_bent
Drew Bent
3 months
Today we're releasing new @AnthropicAI research on how educators use AI, analyzing ~74,000 conversations from professors using @claudeai in collaboration with Northeastern University. 4 initial findings… #1 Educators are builders, not just users of AI. Faculty are creating
17
72
550
@Jack_W_Lindsey
Jack Lindsey
4 months
We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic!  We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us!
Tweet card summary image
job-boards.greenhouse.io
191
207
2K
@mioana
Ioana Marinescu
5 months
Excited to share that @AnthropicAI has launched its Economic Futures Program! As a member of their Economic Advisory Council, I’m thrilled about this initiative supporting research and policy development on AI’s economic impacts. Research grants up to $50K available!
2
69
399
@neilhoulsby
Neil Houlsby
5 months
📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the
Tweet card summary image
job-boards.greenhouse.io
Zürich, CH
12
37
654
@jaschasd
Jascha Sohl-Dickstein
5 months
I will be attending ICML next week. Reach out (by email) if you'd like to chat! About Anthropic / research / life. I'm especially interested in meeting grad students who can teach me new research ideas.
8
9
282
@ssgrn
Sachin Gururangan
8 months
Our team is very excited to release Llama 4! Open reasoning model drops are incoming too 🙂
@AIatMeta
AI at Meta
8 months
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
1
5
70
@ssgrn
Sachin Gururangan
10 months
Check out the newest member of the "Branch-Train" family -- BTS (or, you know, your favorite k-pop boy band)! We introduce "stitch layers", a new architecture to combine expert LLMs with a small amount of training. Amazing work led by our intern @IreneZhang30 !!
@IreneZhang30
Qizhen (Irene) Zhang
10 months
✨New Preprint✨We introduce 𝐁𝐫𝐚𝐧𝐜𝐡-𝐓𝐫𝐚𝐢𝐧-𝐒𝐭𝐢𝐭𝐜𝐡 (𝐁𝐓𝐒), an efficient & flexible method for stitching together independently pretrained LLM experts (i.e. code, math) into a single, capable generalist model. Key Takeaways: ✅BTS achieves the best average
0
2
15
@ssgrn
Sachin Gururangan
1 year
Our team is excited to release Llama 3.3 70B which is comparable in performance to 405B/GPT4o! Post-training go brrrr
@Ahmad_Al_Dahle
Ahmad Al-Dahle
1 year
Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at
2
2
31
@ssgrn
Sachin Gururangan
1 year
New paper by our intern @yue__yu! We use synthetic data to teach reward models to generate rationales for their scalar outputs. Our technique makes RMs less of a black box, more powerful, and more data efficient. Check it out!
@yue___yu
Yue Yu
1 year
🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! https://t.co/Mcv3NvS4lf #rlhf #LLM
1
4
30
@ssgrn
Sachin Gururangan
1 year
2025 internship opps on the Llama team are now live! Feel free to reach out, especially if you’re excited about working on problems in post-training world (eg ranking/judges, reasoning, or all things synthetic data)! Lots of fun things to explore :) https://t.co/sayR82WMVQ
5
30
315
@NandoDF
Nando de Freitas
1 year
The Llama 3 paper is a must-read for anyone in AI and CS. It’s an absolutely accurate and authoritative take on what it takes to build a leading LLM, the tech behind ChatGPT, Gemini, Copilot, and others. The AI part might seem small in comparison to the gargantuan work on *data*
@soumithchintala
Soumith Chintala
1 year
Why do 16k GPU jobs fail? The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc. We hit an overall 90% effective-training-time. https://t.co/hsSIW4bayK
12
289
2K
@ssgrn
Sachin Gururangan
1 year
Excited to give a talk at this workshop! I’ll discuss continual learning in llama 3 posttraining, and directions we’re excited about for llama 4 and beyond.
@arslan_mac
Arslan Chaudhry
1 year
[1/4] Happy to announce that we are organizing a workshop on continuous development of foundation models at NeurIPS’24. Website:
0
1
24
@em_dinan
Emily Dinan
1 year
as my other amazing teammates have already shared, check out our llama 3.1 paper here! lots of fun tidbits about the highs, lows, sweat, and tears that go into training LLMs lol ... onto llama 4!!!
1
9
80
@ssgrn
Sachin Gururangan
1 year
Oh, one more thing! Our new Llama license allows the outputs of the Llama 3.1 models to improve any other model. So, go nuts :)
@ssgrn
Sachin Gururangan
1 year
Llama 3.1 405B is here! It has 128K context, and is a really strong model (MMLU 5-shot 87.3, HumanEval 89.0, MATH 73.8) Model: https://t.co/XYN12ngt4h Paper: https://t.co/zl6ifl9ARm As a member of posttraining team, here are a few takeaways from posttraining Llama 3 🧵
1
0
39