
Luke Bailey
@LukeBailey181
Followers
351
Following
612
Media
9
Statuses
78
CS PhD student @Stanford. Former CS and Math undergraduate @Harvard.
Joined July 2023
RT @emmons_scott: Is CoT monitoring a lost cause due to unfaithfulness? 🤔. We say no. The key is the complexity of the bad behavior. When w….
0
37
0
RT @ChengleiSi: Are AI scientists already better than human researchers?. We recruited 43 PhD students to spend 3 months executing research….
0
164
0
RT @perryadong: Robotic models are advancing rapidly—but how do we scale their improvement? 🤖. We propose a recipe for batch online RL (tra….
0
15
0
RT @zhs05232838: We just released DeepSeek-Prover V2. - Solves nearly 90% of miniF2F problems.- Significantly improves the SoTA performance….
0
323
0
This is a lot of fun and really well put together. I recommend checking out the attention variant notebooks.
trained a nanoGPT? feeling behind before o4-mini?. 🚨🚨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. 🚨🚨. it contains thousands of lines of from-scratch, annotated pytorch implementing advanced
0
0
3
RT @cassidy_laidlaw: We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and ju….
0
217
0
RT @karansdalal: Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training. We add TTT layers to a pre-trai….
0
941
0
RT @YangjunR: New paper on synthetic pretraining!. We show LMs can synthesize their own thoughts for more data-efficient pretraining, boots….
0
103
0
RT @tatsu_hashimoto: @YangjunR 's vision here is cool: Can we use the reasoning capabilities of a model to "fill in the missing context and….
0
8
0
This, in spirit, reminds me of Obfuscated Adversarial Training (OAT) - we don’t explicitly train models not to do harmful things, but instead to have activations that are easy to probe when they do harmful things. We want the model to be misaligned in “the right way” (easy to.
obvious applications of interpretability are steering and monitoring (if you can get those to work that is). another application area i haven't seen much in is evals — we could eval whether models produce correct answers for the right internal reasons?.
0
0
3
RT @hla_michael: I taught an LLM to optimize proteins. It proposed a better carbon capture enzyme. Introducing Pro-1, an 8b param reasonin….
0
340
0
Creating AI regulations with cost and compute thresholds can be made easier by following simple principles. Big thanks to coauthors @StephenLCasper and @schreier_tim.
🚨 New paper: Some AI regulations make requirements contingent on cost & compute thresholds. But there's no standardized accounting procedure. We tackle this problem with 7 practical principles. ***Spoiler alert: DeepSeek did not actually spend only $6M to train V3.***
0
0
6
This paper is very interesting. I wonder if latent space harmfulness probes could detect these kinds of a attacks at inference time.
Defending against adversarial prompts is hard; defending against fine-tuning API attacks is much harder. In our new @AISecurityInst pre-print, we break alignment and extract harmful info using entirely benign and natural interactions during fine-tuning & inference. 😮 🧵 1/10
1
0
9