Cade Gordon
@CadeGordonML
Followers
2K
Following
3K
Media
35
Statuses
221
Helping models grow wise @Anthropic | Hertz Fellow | Prev: LAION-5B & OpenCLIP @UCBerkeley
Berkeley, CA
Joined December 2020
Excited to announce our new work! 🧬 Some highlights are: - sequences likelihoods predict zero-shot fitness capabilities - a new method to calculate pLM likelihood in O(1) instead of O(L) forward passes - providing a causal between training data and outputs - suggesting a new
2
33
204
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
951
3K
21K
Excited to share that I'll be joining @Anthropic to work on pretraining science! I've chosen to defer my Stanford PhD, where I'm honored to be supported by the Hertz Fellowship. There's something special about the science, this place, and these people. Looking forward to joining
42
10
770
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
16
66
244
👏 Meet the 2025 Hertz Fellows—19 rising leaders in science and tech advancing breakthroughs in robotics, energy, medicine & more. 🔗Learn more: https://t.co/RH9zCoCzoR
0
10
33
🎓🤖 We’re thrilled to welcome @CadeGordonML to the 2025 class of Hertz Fellows! Cade’s AI research is advancing biomedical discovery. A future PhD student at @Stanford, he joins a growing community shaping the future of #science and #tech! 🔗 https://t.co/6hKzo1t4Dd
2
2
29
Documenting and sharing research in real-time is underrated in discussions about open science. @jainhiya_ and I think software can help change problem selection, collaboration, and funding. We write about how and why we should create real-time, open lab notebooks.
5
8
60
Chinese policy on clinical trial approvals liberalized massively in the mid 2010s. A decade later the effects of this move are perceptible in where our drugs come from.
1
3
17
Arrived in Singapore for ICLR—excited to see old & new friends! I’ll also be at the: - Thursday 3:30-5pm main conference poster session, presenting work led by @CadeGordonML on the subtleties of using protein LM likelihoods for fitness prediction (see 🔗👇) - GEM workshop
5
4
108
A simple idea to build the @UCBerkeley startup alumni network has grown beyond my wildest dreams into #AccelScholars, a tight-knit community of the most ambitious, talented, kind-hearted people, whose individual stories we’ve been fortunate to support for the past eight years
6
39
166
the IGI wrote a bit about our (in progress) work on building statistical tools for genome mining and discovery! check it out below ⬇️ 🔍
A new IGI article delves into the story behind our method for statistically guaranteed genome mining and discovery of genes of unknown function. The piece offers insights into the journey and motivation driving our work! Read more here: https://t.co/rDmrw1lIY4
0
7
54
🧪 Introducing POPPER: an AI agent that automates hypothesis validation by sequentially designing and executing falsification experiments with statistical rigor. 🔥POPPER matched PhD-level scientists on complex bio hypothesis validation - while reducing time by 10-fold! 🧵👇
25
227
1K
Spent a good few hours and $50 wrangling with a few different implementations not yet finding success. Initial issues in compilation and now OOMs that would need me to reduce batch size or attempt some other tricks. I hope someone can give my code a whirl and have better luck!
0
0
0
We have a similar trick available for writing out the log of harmax, which motivates a way that we can rewrite the equation into logits.
1
0
0
Recall that we can compute the log of softmax with better precision by representing it as the logit minus the logsumexp of all logits. Jax has a great implementation of this https://t.co/GRhwsebL0c
1
0
0
1. We'll keep everything in the training framework fixed except for the LM head. 2. As we're still using CE loss of some probability distribution at the end of the day, we can compare to the normal NanoGPT loss curves. 3. At the end we can try to eek out more performance in the
1
0
0
Now we can start writing up an implementation. I'll list out my plans as this is where engineering choices might start leading to differences in performance. I invite the community to build from my fork and let me know any mistakes!
1
0
0
Getting our hands a bit more dirty, we have the loss calculation. I think this is where the MLEs in the audience might have their eyes light up! We're generating our probabilities, taking the log of them, and sending the result off to the cross_entropy loss. Chaining together
1
0
0
The authors share their original code for the experiment here: https://t.co/ILcMptkH9n Let's start with the computation of the logits. The logits can be written as a modified version of a linear layer with no bias. In particular the original implementation uses the fact that the
1
0
1
The original paper shows some strong early results that for a GPT2 small based off of NanoGPT they get an improved training curve--albeit using a larger learning rate.
1
0
1