cem__anil Profile Banner
Cem Anil Profile
Cem Anil

@cem__anil

Followers
3K
Following
2K
Media
15
Statuses
520

Machine learning / AI Safety at @AnthropicAI and University of Toronto / Vector Institute. Prev. @google (Blueshift Team) and @nvidia.

Toronto, Ontario
Joined November 2018
Don't wanna be here? Send us removal request.
@cem__anil
Cem Anil
1 year
AIs of tomorrow will spend much more of their compute on adapting and learning during deployment. Our first foray into quantitatively studying and forecasting risks from this trend looks at new jailbreaks arising from long contexts. Link:
@AnthropicAI
Anthropic
1 year
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here:
Tweet media one
6
9
61
@cem__anil
Cem Anil
16 days
RT @AnthropicAI: New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause….
0
601
0
@cem__anil
Cem Anil
1 month
RT @AnthropicAI: Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the….
0
3K
0
@cem__anil
Cem Anil
1 month
RT @cursor_ai: Sonnet 4 is available in Cursor! . We've been very impressed by its coding ability. It is much easier to control than 3.7….
0
819
0
@cem__anil
Cem Anil
2 months
RT @DarioAmodei: The Urgency of Interpretability: Why it's crucial that we understand how AI models work
0
542
0
@cem__anil
Cem Anil
3 months
RT @AnthropicAI: Introducing a new Max plan for Claude. It’s flexible, with options for 5x or 20x more usage compared to our Pro plan. Plu….
0
213
0
@cem__anil
Cem Anil
3 months
RT @kayembruno: How do you identify training data responsible for an image generated by your diffusion model? How could you quantify how mu….
0
22
0
@cem__anil
Cem Anil
3 months
RT @AnthropicAI: New Anthropic research: Do reasoning models accurately verbalize their reasoning?. Our new paper shows they don't. This c….
0
622
0
@cem__anil
Cem Anil
3 months
RT @AnthropicAI: New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens i….
0
1K
0
@cem__anil
Cem Anil
3 months
RT @gasteigerjo: New Anthropic blog post: Subtle sabotage in automated researchers. As AI systems increasingly assist with AI research, ho….
0
55
0
@cem__anil
Cem Anil
3 months
RT @TransluceAI: To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken….
0
65
0
@cem__anil
Cem Anil
4 months
RT @_catwu: It’s been a big week for Claude Code. We launched 8 exciting new features to help devs build faster and smarter. Here's a rou….
0
501
0
@cem__anil
Cem Anil
4 months
RT @alirezamh_: With infinite compute, would it make a difference to use Transformers, RNNs, or even vanilla Feedforward nets? They’re all….
0
77
0
@cem__anil
Cem Anil
4 months
RT @janleike: Could we spot a misaligned model in the wild?. To find out, we trained a model with hidden misalignments and asked other rese….
0
40
0
@cem__anil
Cem Anil
4 months
RT @saprmarks: New paper with @j_treutlein , @EvanHub , and many other coauthors!. We train a model with a hidden misaligned objective and….
0
15
0
@cem__anil
Cem Anil
4 months
RT @scychan_brains: New work led by @Aaditya6284:."Strategy coopetition explains the emergence and transience of in-context learning in tra….
0
8
0
@cem__anil
Cem Anil
4 months
RT @Aaditya6284: Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to….
0
23
0
@cem__anil
Cem Anil
4 months
RT @StuartJRitchie: What are you doing this weekend? Maybe you’ll consider applying to work with me at @AnthropicAI!. I’m looking for a bri….
0
46
0
@cem__anil
Cem Anil
4 months
RT @DavidDuvenaud: LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this t….
0
203
0
@cem__anil
Cem Anil
4 months
RT @AnthropicAI: Claude will help power Amazon's next-generation AI assistant, Alexa+. Amazon and Anthropic have worked closely together o….
0
525
0
@cem__anil
Cem Anil
4 months
RT @AnthropicAI: New Anthropic research: Forecasting rare language model behaviors. We forecast whether risks will occur after a model is….
0
148
0