David Huang @davidhuang33176 X Profile

David Huang

@davidhuang33176

Followers

18

Following

80

Media

0

Statuses

12

https://t.co/ZjnoZzi65f

Joined November 2024

Don't wanna be here? Send us removal request.

David Huang

@davidhuang33176

3 months

Paper: https://t.co/smN834meg1 Main Post: https://t.co/D8x6x8zoTr Repo:

github.com

Measuring General Intelligence With Generated Games (Preprint) - vivek3141/gg-bench

Vivek Verma

@vcubingx

6 months

🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd

0

3

David Huang

@davidhuang33176

3 months

If better models mean better solvers and even more complex games, then we might be bootstrapping our way towards a self-sustaining benchmark loop... till AGI, of course. Excited to see how GPT-5 plays and how it might do as the game generator!

1

0

3

David Huang

@davidhuang33176

3 months

In practice, we find that the choice of LLMs for generating games is important. When we used o1, we were able to obtain 126 viable games that fit our criteria over 1000 generations. With GPT-4o, this number drops to 10.

1

0

2

David Huang

@davidhuang33176

3 months

3) The best reasoning model, o1, achieved <36%. For reference, a random policy achieved just under 6%. Perhaps more interesting than the raw win-rates is the framework.

1

0

2

David Huang

@davidhuang33176

3 months

1) For every (game, agent) pair, there exists at least one other agent that wins convincingly. Taken together, these agents achieve >90% on the benchmark. 2) The best non-reasoning model we evaluated on, Claude 3.7 Sonnet, has a win-rate <10%.

1

0

2

David Huang

@davidhuang33176

3 months

Our approach is simple: we query LLMs to create novel two-player strategy games, implement them in Gym environments, and have them compete against PPO-optimized self-play agents. Here, LLMs fail to identify winning policies, even when they exist.

1

0

2

David Huang

@davidhuang33176

3 months

Benchmarking model intelligence, particularly their ability to generalize robustly across diverse stateful and long-horizon tasks, was the focus of our new paper: Measuring General Intelligence with Generated Games.

2

1

6

Chawin Sitawarin

@csitawarin

11 months

🔑 IRIS uses refusal direction ( https://t.co/tqim6LBREu) as part of optimization objective. IRIS jailbreak rates on AdvBench/HarmBench (1 universal suffix, transferred from Llama-3): GPT-4o 76/56%, o1-mini 54/43%, Llama-3-RR 74/25% (vs 2.5% by white-box GCG). (2/7)

1

5

Chawin Sitawarin

@csitawarin

11 months

📃 Workshop paper: https://t.co/nUkWti9DNJ (full paper soon!) 👥 Co-authors: @davidhuang33176, Avi Shah, @alexarauj_, David Wagner. (7/7)

openreview.net

Making large language models (LLMs) safe for mass deployment is a complex and ongoing challenge. Efforts have focused on aligning models to human prefer- ences (RLHF) in order to prevent malicious...

0

1

2

Chawin Sitawarin

@csitawarin

11 months

Most importantly, this project is led by 2 amazing Berkeley undergrads (David Huang - https://t.co/5MpHMsqroj & Avi Shah - https://t.co/HtrZbCybEX). They are undoubtedly promising researchers and also applying for PhD programs this year! Please reach out to them! (6/7)

1

Chawin Sitawarin

@csitawarin

11 months

📢 Excited to share our new result on LLM jailbreak! ⚔️ We propose IRIS, a simple automated 𝘂𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹 𝗮𝗻𝗱 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝗿𝗮𝗯𝗹𝗲 𝗷𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸 𝘀𝘂𝗳𝗳𝗶𝘅 that works on GPTs, o1, and Circuit Breaker defense! To appear at NeurIPS Safe GenAI Workshop! (1/7)

2

29