qd_forall Profile Banner
Quinn Dougherty (UK) Profile
Quinn Dougherty (UK)

@qd_forall

Followers
105
Following
2K
Media
62
Statuses
1K

Bridging the cultures between formal verification and AI. My p(doom) is 50% cuz it either happens or it doesn't. https://t.co/NlknpJL8D6

Berkeley, CA
Joined August 2023
Don't wanna be here? Send us removal request.
@qd_forall
Quinn Dougherty (UK)
5 months
Today, we're releasing Proving the Coding Interview on arxiv and huggingface. We believe FVAPPS is currently the largest formal verification benchmark, consisting of leetcode-style problems in @leanprover.
1
7
26
@qd_forall
Quinn Dougherty (UK)
20 hours
RT @AISecurityInst: White box methods can be a useful complement to black box monitoring, especially if:. ❌Chains of thought do not reflect….
0
1
0
@qd_forall
Quinn Dougherty (UK)
22 hours
one of the coolest things my math prof did was print out wikipedia pages and make sure we knew how to read them. His justification was like "learn this language and you can join the community".
0
0
1
@qd_forall
Quinn Dougherty (UK)
22 hours
i love it when . me: i'm worried this initiative is a bad idea.coworkers: thanks for your interest in this initiative! i'll loop you into discussions going forward. it's like peak "complaining is volunteering to fix it". seems like a valuable thing in an organization culture.
0
0
2
@qd_forall
Quinn Dougherty (UK)
23 hours
one project i'd like to do if i was funemployed and not worried about AI takeover is an agentdev library in Lean or Haskell. I already have a bunch of ideas from an agent codebase I did in fp-ts a few months ago.
0
1
1
@qd_forall
Quinn Dougherty (UK)
2 days
This happened to me last week metaprogramming in lean.
@NeelNanda5
Neel Nanda
2 days
My MATS scholars are teaching me such valuable things about Claude Code as a research tool!. Pro: Much faster research results - productivity is off the charts.Con: Often the most interesting results are hard-coded. (Credit to @edturner42 for seeing through Claude's lies).
0
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
but LLMs give us the opportunity to study this with actual hard data!.
0
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
envisioning a large scale study or eval that gets really good data on verification burdens from the LLM's perspective. There's been a TON of folklore about the verification burden in humans, where we have massive confidence intervals about what verification does to labor cost.
1
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
In my Iliad/Odyssey submission, my metric _verification burden_ measures the ratio of how many tokens it costs to prove a completion correct against how many tokens it cost to produce the completion. I submitted a couple weeks ago, but I'm just now.
1
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
doing it's part to solve fertility collapse
Tweet media one
0
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
RT @qd_forall: @NathanpmYoung @DKokotajlo So idk when people say "thats all vibes" or "so n so was wrong about the events", in my brain im….
0
1
0
@qd_forall
Quinn Dougherty (UK)
2 days
RT @qd_forall: @NathanpmYoung @DKokotajlo I literally spent intimate grind moments with the quantification of surprise for several hours, a….
0
1
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo Im obviously not doing the calculation all the time, but I can kinda see and feel shapes of certain sizes in my head when I picture accuracy, calibration, information contribution, etc.
0
0
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo So idk when people say "thats all vibes" or "so n so was wrong about the events", in my brain im thinking about a distribution's accurate information contribution once you plug in the ground truth, reminiscing about the formalism and numerical properties of KL divergence.
1
1
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo I literally spent intimate grind moments with the quantification of surprise for several hours, and I rely on that inside view when I reason about how many bayes points to award various hot takes as the world changes. I really dont think id be able to do that if I hadnt!!.
1
1
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo And im kinda worried that it's a niche, sophisticated skill? I only have it cuz i wrote out the integrals for the KLDivergence numerical tests in the squiggle codebase back when I worked there
1
0
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo What im interested in is what mental tools are required for members of the audience to say "the point estimate was off by a bounded amount, but it was still reasonably somewhere in the distribution, therefore was valuable info/estimate in expectation".
1
0
0
@qd_forall
Quinn Dougherty (UK)
2 days
@NathanpmYoung @DKokotajlo I noticed watching the video that we expect the essay to be wrong in specifics like the success or failure of an exfiltration attack, etc. Daniel himself has said 2028 makes more sense, since the thing was published.
1
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
Some people say AI2027 is all vibes and not research. There's some sense in which this is true as @NathanpmYoung says "forecasting is just vibes plus track record", but it's still a misleading statement, because @DKokotajlo has that track record and cuz rigor in the estimates.
1
0
1
@qd_forall
Quinn Dougherty (UK)
2 days
Exciting web content! I have a random AI2027/forecasting thought, thread:.
@ChanaMessinger
Chana
2 days
If I may say so myself, it’s immersive, beautiful and compelling, with interviews to put the whole thing in context and a banger discussion of what a sane world would be doing. Huge props to our host of AI in Context, @AricFloyd !.
1
0
0
@qd_forall
Quinn Dougherty (UK)
2 days
What manner of beast
Tweet media one
0
1
2