
Cem Anil
@cem__anil
Followers
3K
Following
2K
Media
15
Statuses
520
Machine learning / AI Safety at @AnthropicAI and University of Toronto / Vector Institute. Prev. @google (Blueshift Team) and @nvidia.
Toronto, Ontario
Joined November 2018
AIs of tomorrow will spend much more of their compute on adapting and learning during deployment. Our first foray into quantitatively studying and forecasting risks from this trend looks at new jailbreaks arising from long contexts. Link:
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here:
6
9
61
RT @AnthropicAI: New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause….
0
601
0
RT @AnthropicAI: Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the….
0
3K
0
RT @cursor_ai: Sonnet 4 is available in Cursor! . We've been very impressed by its coding ability. It is much easier to control than 3.7….
0
819
0
RT @DarioAmodei: The Urgency of Interpretability: Why it's crucial that we understand how AI models work
0
542
0
RT @AnthropicAI: Introducing a new Max plan for Claude. It’s flexible, with options for 5x or 20x more usage compared to our Pro plan. Plu….
0
213
0
RT @kayembruno: How do you identify training data responsible for an image generated by your diffusion model? How could you quantify how mu….
0
22
0
RT @AnthropicAI: New Anthropic research: Do reasoning models accurately verbalize their reasoning?. Our new paper shows they don't. This c….
0
622
0
RT @AnthropicAI: New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens i….
0
1K
0
RT @gasteigerjo: New Anthropic blog post: Subtle sabotage in automated researchers. As AI systems increasingly assist with AI research, ho….
0
55
0
RT @TransluceAI: To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken….
0
65
0
RT @_catwu: It’s been a big week for Claude Code. We launched 8 exciting new features to help devs build faster and smarter. Here's a rou….
0
501
0
RT @alirezamh_: With infinite compute, would it make a difference to use Transformers, RNNs, or even vanilla Feedforward nets? They’re all….
0
77
0
RT @janleike: Could we spot a misaligned model in the wild?. To find out, we trained a model with hidden misalignments and asked other rese….
0
40
0
RT @saprmarks: New paper with @j_treutlein , @EvanHub , and many other coauthors!. We train a model with a hidden misaligned objective and….
0
15
0
RT @scychan_brains: New work led by @Aaditya6284:."Strategy coopetition explains the emergence and transience of in-context learning in tra….
0
8
0
RT @Aaditya6284: Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to….
0
23
0
RT @StuartJRitchie: What are you doing this weekend? Maybe you’ll consider applying to work with me at @AnthropicAI!. I’m looking for a bri….
0
46
0
RT @DavidDuvenaud: LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this t….
0
203
0
RT @AnthropicAI: Claude will help power Amazon's next-generation AI assistant, Alexa+. Amazon and Anthropic have worked closely together o….
0
525
0
RT @AnthropicAI: New Anthropic research: Forecasting rare language model behaviors. We forecast whether risks will occur after a model is….
0
148
0