Sachin Gururangan @ssgrn X Profile

Sachin Gururangan

@ssgrn

Followers

7K

Following

2K

Media

71

Statuses

971

Researcher @AnthropicAI Prev:🦙 @aiatmeta, @allen_ai PhD @uwcse + @uwnlp

https://t.co/ChdKUw1HjL

SF x LA

Joined November 2011

Don't wanna be here? Send us removal request.

Sachin Gururangan

@ssgrn

6 months

Life update: I’m thrilled to be joining the pretraining team at @AnthropicAI next week! Grateful to everyone at @Meta GenAI for an incredible journey building Llama. Excited for the next chapter 🚀

39

8

971

Claude

@claudeai

2 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

902

2K

18K

Anthropic

@AnthropicAI

8 days

We’ve formed a partnership with NVIDIA and Microsoft. Claude is now on Azure—making ours the only frontier models available on all three major cloud services. NVIDIA and Microsoft will invest up to $10bn and $5bn respectively in Anthropic. https://t.co/3RA82NEIJ3

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

184

337

4K

Anthropic

@AnthropicAI

19 days

We’re opening offices in Paris and Munich. EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than ninefold in the past year. We’ll be hiring local teams to support this expansion. Read more here:

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

122

104

2K

Anthropic

@AnthropicAI

1 month

Today, we announced that we plan to expand our use of Google TPUs, securing approximately one million TPUs and more than a gigawatt of capacity in 2026.

235

458

6K

Claude

@claudeai

1 month

Introducing Claude Haiku 4.5: our latest small model. Five months ago, Claude Sonnet 4 was state-of-the-art. Today, Haiku 4.5 matches its coding performance at one-third the cost and more than twice the speed.

323

1K

7K

Drew Bent

@drew_bent

3 months

Today we're releasing new @AnthropicAI research on how educators use AI, analyzing ~74,000 conversations from professors using @claudeai in collaboration with Northeastern University. 4 initial findings… #1 Educators are builders, not just users of AI. Faculty are creating

17

72

550

Jack Lindsey

@Jack_W_Lindsey

4 months

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us!

job-boards.greenhouse.io

191

207

2K

Ioana Marinescu

@mioana

5 months

Excited to share that @AnthropicAI has launched its Economic Futures Program! As a member of their Economic Advisory Council, I’m thrilled about this initiative supporting research and policy development on AI’s economic impacts. Research grants up to $50K available!

2

69

399

Neil Houlsby

@neilhoulsby

5 months

📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the

job-boards.greenhouse.io

Zürich, CH

12

37

654

Jascha Sohl-Dickstein

@jaschasd

5 months

I will be attending ICML next week. Reach out (by email) if you'd like to chat! About Anthropic / research / life. I'm especially interested in meeting grad students who can teach me new research ideas.

8

9

282

Sachin Gururangan

@ssgrn

8 months

Our team is very excited to release Llama 4! Open reasoning model drops are incoming too 🙂

AI at Meta

@AIatMeta

8 months

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

1

5

70

Sachin Gururangan

@ssgrn

10 months

Check out the newest member of the "Branch-Train" family -- BTS (or, you know, your favorite k-pop boy band)! We introduce "stitch layers", a new architecture to combine expert LLMs with a small amount of training. Amazing work led by our intern @IreneZhang30 !!

Qizhen (Irene) Zhang

@IreneZhang30

10 months

✨New Preprint✨We introduce 𝐁𝐫𝐚𝐧𝐜𝐡-𝐓𝐫𝐚𝐢𝐧-𝐒𝐭𝐢𝐭𝐜𝐡 (𝐁𝐓𝐒), an efficient & flexible method for stitching together independently pretrained LLM experts (i.e. code, math) into a single, capable generalist model. Key Takeaways: ✅BTS achieves the best average

0

2

15

Sachin Gururangan

@ssgrn

1 year

Our team is excited to release Llama 3.3 70B which is comparable in performance to 405B/GPT4o! Post-training go brrrr

Ahmad Al-Dahle

@Ahmad_Al_Dahle

1 year

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at

2

31

Sachin Gururangan

@ssgrn

1 year

New paper by our intern @yue__yu! We use synthetic data to teach reward models to generate rationales for their scalar outputs. Our technique makes RMs less of a black box, more powerful, and more data efficient. Check it out!

Yue Yu

@yue___yu

1 year

🔍 Reward modeling is a reasoning task—can self-generated CoT-style critiques help? 🚀 Check out my intern work at Llama Team @AIatMeta, 3.7-7.3% gains on RewardBench vs. RM & LLM judge baselines, with better generalization & data efficiency! https://t.co/Mcv3NvS4lf #rlhf #LLM

1

4

30

Sachin Gururangan

@ssgrn

1 year

2025 internship opps on the Llama team are now live! Feel free to reach out, especially if you’re excited about working on problems in post-training world (eg ranking/judges, reasoning, or all things synthetic data)! Lots of fun things to explore :) https://t.co/sayR82WMVQ

5

30

315

Nando de Freitas

@NandoDF

1 year

The Llama 3 paper is a must-read for anyone in AI and CS. It’s an absolutely accurate and authoritative take on what it takes to build a leading LLM, the tech behind ChatGPT, Gemini, Copilot, and others. The AI part might seem small in comparison to the gargantuan work on *data*

Soumith Chintala

@soumithchintala

1 year

Why do 16k GPU jobs fail? The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc. We hit an overall 90% effective-training-time. https://t.co/hsSIW4bayK

12

289

2K

Sachin Gururangan

@ssgrn

1 year

Excited to give a talk at this workshop! I’ll discuss continual learning in llama 3 posttraining, and directions we’re excited about for llama 4 and beyond.

Arslan Chaudhry

@arslan_mac

1 year

[1/4] Happy to announce that we are organizing a workshop on continuous development of foundation models at NeurIPS’24. Website:

0

1

24

Emily Dinan

@em_dinan

1 year

as my other amazing teammates have already shared, check out our llama 3.1 paper here! lots of fun tidbits about the highs, lows, sweat, and tears that go into training LLMs lol ... onto llama 4!!!

1

9

80

Sachin Gururangan

@ssgrn

1 year

Oh, one more thing! Our new Llama license allows the outputs of the Llama 3.1 models to improve any other model. So, go nuts :)

Sachin Gururangan

@ssgrn

1 year

Llama 3.1 405B is here! It has 128K context, and is a really strong model (MMLU 5-shot 87.3, HumanEval 89.0, MATH 73.8) Model: https://t.co/XYN12ngt4h Paper: https://t.co/zl6ifl9ARm As a member of posttraining team, here are a few takeaways from posttraining Llama 3 🧵

1

0

39