Roger Grosse Profile
Roger Grosse

@RogerGrosse

Followers
11K
Following
2K
Media
23
Statuses
1K

Joined July 2015
Don't wanna be here? Send us removal request.
@RogerGrosse
Roger Grosse
7 months
RT @OwainEvans_UK: New paper:.We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions. They can *d….
0
153
0
@RogerGrosse
Roger Grosse
8 months
RT @RyanPGreenblatt: New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes a….
0
44
0
@RogerGrosse
Roger Grosse
8 months
RT @MariusHobbhahn: Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some….
0
93
0
@RogerGrosse
Roger Grosse
8 months
RT @janleike: Apply to join the Anthropic Fellows Program!. This is an exceptional opportunity to join AI safety research, collaborating wi….
0
84
0
@RogerGrosse
Roger Grosse
9 months
RT @UofT_IHPST: 📢 The AI safety conversation continues! Check insights from last month's expert panel @karinavold @RogerGrosse @SheilaMcIlr….
0
3
0
@RogerGrosse
Roger Grosse
9 months
RT @MariusHobbhahn: xAI is hiring for AI safety engineers: Their safety agenda isn't public, so I can't judge it.….
Tweet card summary image
job-boards.greenhouse.io
0
22
0
@RogerGrosse
Roger Grosse
9 months
We're hiring. 🇨🇦.
1
7
145
@RogerGrosse
Roger Grosse
9 months
We’re sharing these sketches because we’d love to get feedback which can influence our research directions over the coming years. Check out the safety case sketches here:
2
3
35
@RogerGrosse
Roger Grosse
9 months
None of these cases fully succeeds at ruling out sabotage on its own. For each sketch, we highlight limitations and loose ends, which suggest priorities for AI alignment research aimed at safety assurance for powerful AIs of the future.
1
1
14
@RogerGrosse
Roger Grosse
9 months
We present three sketches of safety cases that could be used to rule out sabotage: these are based on interpretability, AI control, and analysis of environmental incentives, respectively.
1
0
12
@RogerGrosse
Roger Grosse
9 months
Making evaluations and monitoring robust against strategic sabotage by an AI model presents a recursive challenge, because the sorts of protocols one would normally use to rule out bad behaviors would themselves be subject to sabotage.
1
1
12
@RogerGrosse
Roger Grosse
9 months
How can you assure the safety of AIs that might be capable enough to strategically undermine evaluations and monitoring if they had a reason to?. In our new Anthropic alignment science research blog, we present three sketches of candidate safety cases aimed at such a scenario.
5
19
155
@RogerGrosse
Roger Grosse
10 months
RT @TransluceAI: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and….
0
147
0
@RogerGrosse
Roger Grosse
10 months
RT @AnthropicAI: New Anthropic research: Sabotage evaluations for frontier models. How well could AI models mislead us, or secretly sabotag….
0
156
0
@RogerGrosse
Roger Grosse
11 months
AI benchmarks are most useful for judging the aggregate progress of a subfield, and hence for evaluating its methodologies. And if there's one methodological lesson they've taught us, it's that the progress is made by people who aren't hill-climbing on the benchmarks!.
4
1
58
@RogerGrosse
Roger Grosse
11 months
Is there a good history of how academic AI research came to be so focused on hill-climbing on benchmarks? Any classic essays championing this as a good way to make progress?.
21
11
141
@RogerGrosse
Roger Grosse
11 months
These days, the number of citations your papers get is a nonmonotonic function of their usefulness.
1
0
25
@RogerGrosse
Roger Grosse
11 months
RT @TorontoSRI: What is "safe" AI? Why is it difficult to achieve? Can LLMs be hacked? Are the existential risks of advanced AI exaggerated….
0
4
0
@RogerGrosse
Roger Grosse
11 months
Amortized variational inference is neither amortized nor variational nor inference.
@jpillowtime
Jonathan Pillow
11 months
Amortize means to "to pay off a debt with regular payments" . But in amortized inference you pay a big up-front cost to train an inference network, then inference is cheap per datapoint. Isn't that the opposite of amortization?. Shouldn't we call it Big Down Payment inference?.
6
3
65