Ben Harack
@benharack
Followers
277
Following
4K
Media
13
Statuses
1K
International Relations & AI. @aigioxford Within DPhil @Politics_Oxford. Former @hdx. Not here often. Find me at https://t.co/XOTH3Zrv4C
Oxford, UK
Joined March 2009
🚨New AI Safety Course @aims_oxford! I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford what to expect 👇 https://t.co/r9YHS3XJhR
7
21
110
I started this work as a verification skeptic. But, being able to signal benignness (as @Miles_Brundage puts it) will likely be important in both national and foreign policy contexts. Happy to have been a small part of this massive undertaking by @BenHarack.
The future of AI governance may hinge on our ability to develop trusted and effective ways to make credible claims about AI systems. This new report expands our understanding of the verification challenge and maps out compelling areas for further work. ⬇️
1
1
21
The future of AI governance may hinge on our ability to develop trusted and effective ways to make credible claims about AI systems. This new report expands our understanding of the verification challenge and maps out compelling areas for further work. ⬇️
Governing AI requires international agreements, but cooperation can be risky if there’s no basis for trust. Our new report looks at how to verify compliance with AI agreements without sacrificing national security. This is neither impossible nor trivial.🧵 1/
12
21
118
16/ @janet_e_egan @ben_s_bucknall @rosen_br @araujonrenan @BoulaninSIPRI Ranjit Lall @FazlBarez Sanaa Alvira @Corin_Katzke Ahmad Atamli Amro Awad /endđź§µ
0
0
9
15/ Thanks to @aigioxford for backing this project and all my coauthors: @RobertTrager @AnkaReuel @davidmanheim @Miles_Brundage @onni_aarne @aaronscher Yanliang Pan @jennywxiao @kristy_loke @SumayaNur_ @gbasg_ @nickacaputo @JuliaCMorse @jn_ahuja @IsabellaDuan
1
0
9
13/ Those who lived through or studied the Cold War may remember President Reagan reiterating the Russian proverb “Trust, but verify.” Just as it was with 1980s nuclear arms control, our ability to build new verification systems may be crucial for preserving peace today.
1
0
7
12/ If we build these more serious verification systems, we would be laying the foundation for international agreements over AI—which might end up being the most important international deals in the history of humanity.
1
0
6
11/ It seems possible to create similar verification exchanges that preserve security to an extreme degree, but we’ll need political action to get there. Our report goes into this in some detail. These setups might take about 1-3 years of intense effort to research and build.
1
0
7
10/ However, even if we scale this up, the most important secrets (think national security info, military AI models, or the Coca-Cola formula) are probably too sensitive to govern via just confidential computing. Further work is needed to safeguard these.
1
0
6
9/ Groups that use AI (including corporations and countries) will likewise place more trust in AI services that they can be sure are secure and appropriately governed. They may also request—or demand—this kind of thing in the future.
1
0
6
8/ This setup allows 1) users to feel safe and confident about services they pay for, 2) companies to expand their offerings to more sensitive domains, and 3) governments to check that rules are followed.
1
0
6
7/ An AI provider can prove that they abide by rules by having a set of third parties (e.g., AI testing companies and AI Safety / Security Institutes) securely test their models and systems. A user can trust a group of third parties a *lot* more than they trust the AI provider.
1
0
6
6/ Confidential computing might be reliable enough for a company to make pretty strong claims about what they are *doing* (e.g., serving you inference with a given model and compute budget) and what they are *not doing* (e.g., copying your data).
1
0
6
5/ Some of these technologies can be deployed *today*, such as confidential computing, which is available in recent hardware such as NVIDIA’s Hopper or Blackwell chips. These are good enough to get us started.
1
0
7
4/ Luckily, decades of work has gone into privacy-preserving computational methods. Basically they are tricks with hardware and cryptography that allow one actor (Prover) to prove to another actor (Verifier) something without revealing all the underlying data.
1
0
9
3/ But countries care about their security, so we can’t expect them to simply hand over all the information needed to prove that they’re following governance rules.
1
0
6
2/ International AI governance is desirable (for peace, security, and good lives), but it faces verification challenges because there’s no easy way to understand what someone else is doing on their computer without violating their security.
1
0
6
Governing AI requires international agreements, but cooperation can be risky if there’s no basis for trust. Our new report looks at how to verify compliance with AI agreements without sacrificing national security. This is neither impossible nor trivial.🧵 1/
3
33
94
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) đź§µ
28
137
658