Danny Halawi @dannyhalawi15 X Profile

Danny Halawi

@dannyhalawi15

Followers

1K

Following

194

Media

8

Statuses

55

AI Research

San Francisco

Joined March 2023

Don't wanna be here? Send us removal request.

Danny Halawi

@dannyhalawi15

10 months

RT @mobav0: I believe in AGI, but also believe that for most use cases, model quality won't be the bottleneck. Lots of folks will have grea….

0

2

0

Danny Halawi

@dannyhalawi15

10 months

RT @PTetlock: Lots of competition to develop LLMs that beat top human forecasters—& lots of temptations to make exaggerated claims. So a ne….

0

15

0

Danny Halawi

@dannyhalawi15

11 months

If the original authors want to test their set up on the questions I am referring to you can find them on

2

0

55

Danny Halawi

@dannyhalawi15

11 months

If LLMs were to achieve superhuman levels of forecasting, the implications are hard to overstate. I urge the authors to rigorously test their setup and provide more clarity in their report before announcing findings that could significantly impact how governments, non-profits,.

5

0

77

Danny Halawi

@dannyhalawi15

11 months

The 4-page paper "LLMs Are Superhuman Forecasters" is missing a lot of important details. Just some of the questions I had:. Dataset:.- What date range do the questions cover?.- Are they balanced across domains? (e.g. sports questions are easier to do well on).- Do you filter.

2

76

Danny Halawi

@dannyhalawi15

11 months

First issue:. The authors assumed that GPT-4o/GPT-4o-mini has a knowledge cut-off date of October 2023. However, this is not correct. For example, GPT-4o knows that Mike Johnson replaced Kevin McCarthy as speaker of the house. 1. This event happened at the end of October. 2.

3

2

76

Danny Halawi

@dannyhalawi15

11 months

The results in "LLMs Are Superhuman Forecasters" don't hold when given another set of forecasting questions. I used their codebase (models, prompts, retrieval, etc.) to evaluate a new set of 324 questions—all opened after November 2023. Findings:.Their Brier score: .195.Crowd.

Dan Hendrycks

@DanHendrycks

11 months

We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:

11

56

528

Danny Halawi

@dannyhalawi15

11 months

Love seeing further work on automated AI forecasting!. The author's assume a knowledge cut off of October 2023, but I prompted gpt-4o (as I saw in the github) for events after that date and it knew about them. I plan to reproduce the results in this writeup on a new set of.

Dan Hendrycks

@DanHendrycks

11 months

We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:

0

19

Danny Halawi

@dannyhalawi15

11 months

RT @lisabdunlap: Turns out you can just book a meeting room and announce an "invited talk" about whatever you want. Here is my talk and tas….

0

5

0

Danny Halawi

@dannyhalawi15

11 months

RT @stanislavfort: I have written up my argument for understanding adversarial attacks in computer vision as a baby version of general AI a….

0

16

0

Danny Halawi

@dannyhalawi15

11 months

RT @NeelNanda5: I'm a long time fan of @3blue1brown. It was really awesome to see my and @SenR's work on how LLMs store facts discussed in….

0

55

0

Danny Halawi

@dannyhalawi15

1 year

At ICML, presenting on this work today (w/ @aweisawei). Reach out if you wanna chat or hang out~.

Danny Halawi

@dannyhalawi15

1 year

New paper! We introduce Covert Malicious Finetuning (CMFT), a method for jailbreaking language models via fine-tuning that avoids detection. We use our method to covertly jailbreak GPT-4 via the OpenAI finetuning API.

0

2

14

Danny Halawi

@dannyhalawi15

1 year

RT @amaarora: I have primarily switched to Claude 3.5 Sonnet and hardly use GPT-4. Anybody else?.

0

57

0

Danny Halawi

@dannyhalawi15

1 year

RT @EthanJPerez: One of the most important and well-executed papers I've read in months. They explored ~all attacks+defenses I was most ke….

0

9

0

Danny Halawi

@dannyhalawi15

1 year

RT @ADarmouni: Smart finetuning to break safety defenses. 🧵📖 Read of the day, day 97: Covert Malicious Finetuning: Challenges in Safeguardi….

0

3

0

Danny Halawi

@dannyhalawi15

1 year

RT @janleike: Interested in working at Anthropic? We're hosting a happy hour at ICML on July 23. Register here:

0

27

0

Danny Halawi

@dannyhalawi15

1 year

RT @JerryWeiAI: One thing that I've come to deeply appreciate at Anthropic is how useful quick iteration times can be. In the current era….

0

21

0

Danny Halawi

@dannyhalawi15

1 year

Link to the paper: The authors: @dannyhalawi15, @aweisawei, @eric_wallace_, @TonyWangIV, @nhaghtal, @JacobSteinhardt. Thanks goes to @EthanJPerez, @farairesearch, @FabienDRoger, and @BerkeleyNLP for compute support, helpful discussions, and feedback.

arxiv.org

Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs. However, such access may also let malicious actors undermine model safety. To demonstrate...

1

15