davidhuang33176 Profile Banner
David Huang Profile
David Huang

@davidhuang33176

Followers
18
Following
80
Media
0
Statuses
12

Joined November 2024
Don't wanna be here? Send us removal request.
@davidhuang33176
David Huang
3 months
Tweet card summary image
github.com
Measuring General Intelligence With Generated Games (Preprint) - vivek3141/gg-bench
@vcubingx
Vivek Verma
6 months
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd
0
0
3
@davidhuang33176
David Huang
3 months
If better models mean better solvers and even more complex games, then we might be bootstrapping our way towards a self-sustaining benchmark loop... till AGI, of course. Excited to see how GPT-5 plays and how it might do as the game generator!
1
0
3
@davidhuang33176
David Huang
3 months
In practice, we find that the choice of LLMs for generating games is important. When we used o1, we were able to obtain 126 viable games that fit our criteria over 1000 generations. With GPT-4o, this number drops to 10.
1
0
2
@davidhuang33176
David Huang
3 months
3) The best reasoning model, o1, achieved <36%. For reference, a random policy achieved just under 6%. Perhaps more interesting than the raw win-rates is the framework.
1
0
2
@davidhuang33176
David Huang
3 months
1) For every (game, agent) pair, there exists at least one other agent that wins convincingly. Taken together, these agents achieve >90% on the benchmark. 2) The best non-reasoning model we evaluated on, Claude 3.7 Sonnet, has a win-rate <10%.
1
0
2
@davidhuang33176
David Huang
3 months
Our approach is simple: we query LLMs to create novel two-player strategy games, implement them in Gym environments, and have them compete against PPO-optimized self-play agents. Here, LLMs fail to identify winning policies, even when they exist.
1
0
2
@davidhuang33176
David Huang
3 months
Benchmarking model intelligence, particularly their ability to generalize robustly across diverse stateful and long-horizon tasks, was the focus of our new paper: Measuring General Intelligence with Generated Games.
2
1
6
@csitawarin
Chawin Sitawarin
11 months
🔑 IRIS uses refusal direction ( https://t.co/tqim6LBREu) as part of optimization objective. IRIS jailbreak rates on AdvBench/HarmBench (1 universal suffix, transferred from Llama-3): GPT-4o 76/56%, o1-mini 54/43%, Llama-3-RR 74/25% (vs 2.5% by white-box GCG). (2/7)
1
1
5
@csitawarin
Chawin Sitawarin
11 months
Most importantly, this project is led by 2 amazing Berkeley undergrads (David Huang - https://t.co/5MpHMsqroj & Avi Shah - https://t.co/HtrZbCybEX). They are undoubtedly promising researchers and also applying for PhD programs this year! Please reach out to them! (6/7)
1
1
1
@csitawarin
Chawin Sitawarin
11 months
📢 Excited to share our new result on LLM jailbreak! ⚔️ We propose IRIS, a simple automated 𝘂𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹 𝗮𝗻𝗱 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝗿𝗮𝗯𝗹𝗲 𝗷𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸 𝘀𝘂𝗳𝗳𝗶𝘅 that works on GPTs, o1, and Circuit Breaker defense! To appear at NeurIPS Safe GenAI Workshop! (1/7)
2
2
29