
Danny Halawi
@dannyhalawi15
Followers
1K
Following
194
Media
8
Statuses
55
RT @mobav0: I believe in AGI, but also believe that for most use cases, model quality won't be the bottleneck. Lots of folks will have grea….
0
2
0
RT @PTetlock: Lots of competition to develop LLMs that beat top human forecasters—& lots of temptations to make exaggerated claims. So a ne….
0
15
0
The results in "LLMs Are Superhuman Forecasters" don't hold when given another set of forecasting questions. I used their codebase (models, prompts, retrieval, etc.) to evaluate a new set of 324 questions—all opened after November 2023. Findings:.Their Brier score: .195.Crowd.
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:
11
56
528
Love seeing further work on automated AI forecasting!. The author's assume a knowledge cut off of October 2023, but I prompted gpt-4o (as I saw in the github) for events after that date and it knew about them. I plan to reproduce the results in this writeup on a new set of.
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:
0
0
19
RT @lisabdunlap: Turns out you can just book a meeting room and announce an "invited talk" about whatever you want. Here is my talk and tas….
0
5
0
RT @stanislavfort: I have written up my argument for understanding adversarial attacks in computer vision as a baby version of general AI a….
0
16
0
RT @NeelNanda5: I'm a long time fan of @3blue1brown. It was really awesome to see my and @SenR's work on how LLMs store facts discussed in….
0
55
0
At ICML, presenting on this work today (w/ @aweisawei). Reach out if you wanna chat or hang out~.
New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.
0
2
14
RT @amaarora: I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?.
0
57
0
RT @EthanJPerez: One of the most important and well-executed papers I've read in months. They explored ~all attacks+defenses I was most ke….
0
9
0
RT @ADarmouni: Smart finetuning to break safety defenses. 🧵📖 Read of the day, day 97: Covert Malicious Finetuning: Challenges in Safeguardi….
0
3
0
RT @janleike: Interested in working at Anthropic? We're hosting a happy hour at ICML on July 23. Register here:
0
27
0
RT @JerryWeiAI: One thing that I've come to deeply appreciate at Anthropic is how useful quick iteration times can be. In the current era….
0
21
0
Link to the paper: The authors: @dannyhalawi15, @aweisawei, @eric_wallace_, @TonyWangIV, @nhaghtal, @JacobSteinhardt. Thanks goes to @EthanJPerez, @farairesearch, @FabienDRoger, and @BerkeleyNLP for compute support, helpful discussions, and feedback.
arxiv.org
Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate...
1
1
15