CadeGordonML Profile Banner
Cade Gordon Profile
Cade Gordon

@CadeGordonML

Followers
2K
Following
3K
Media
35
Statuses
221

Helping models grow wise @Anthropic | Hertz Fellow | Prev: LAION-5B & OpenCLIP @UCBerkeley

Berkeley, CA
Joined December 2020
Don't wanna be here? Send us removal request.
@CadeGordonML
Cade Gordon
1 year
Excited to announce our new work! 🧬 Some highlights are: - sequences likelihoods predict zero-shot fitness capabilities - a new method to calculate pLM likelihood in O(1) instead of O(L) forward passes - providing a causal between training data and outputs - suggesting a new
2
33
204
@CadeGordonML
Cade Gordon
20 days
Patient work, careful hands. See what grew.
@claudeai
Claude
20 days
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
5
1
86
@AnthropicAI
Anthropic
7 months
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
951
3K
21K
@CadeGordonML
Cade Gordon
7 months
Excited to share that I'll be joining @Anthropic to work on pretraining science! I've chosen to defer my Stanford PhD, where I'm honored to be supported by the Hertz Fellowship. There's something special about the science, this place, and these people. Looking forward to joining
42
10
770
@Mike_A_Merrill
Mike A. Merrill
7 months
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
16
66
244
@HertzFoundation
Hertz Foundation
7 months
👏 Meet the 2025 Hertz Fellows—19 rising leaders in science and tech advancing breakthroughs in robotics, energy, medicine & more. 🔗Learn more: https://t.co/RH9zCoCzoR
0
10
33
@HertzFoundation
Hertz Foundation
7 months
🎓🤖 We’re thrilled to welcome @CadeGordonML to the 2025 class of Hertz Fellows! Cade’s AI research is advancing biomedical discovery. A future PhD student at @Stanford, he joins a growing community shaping the future of #science and #tech! 🔗 https://t.co/6hKzo1t4Dd
2
2
29
@jajoosam
Samarth Jajoo
8 months
Documenting and sharing research in real-time is underrated in discussions about open science. @jainhiya_ and I think software can help change problem selection, collaboration, and funding. We write about how and why we should create real-time, open lab notebooks.
5
8
60
@jainhiya_
Hiya Jain
8 months
Chinese policy on clinical trial approvals liberalized massively in the mid 2010s. A decade later the effects of this move are perceptible in where our drugs come from.
1
3
17
@amyxlu
Amy Lu
8 months
Arrived in Singapore for ICLR—excited to see old & new friends! I’ll also be at the: - Thursday 3:30-5pm main conference poster session, presenting work led by @CadeGordonML on the subtleties of using protein LM likelihoods for fitness prediction (see 🔗👇) - GEM workshop
5
4
108
@amitku
Amit Kumar
9 months
A simple idea to build the @UCBerkeley startup alumni network has grown beyond my wildest dreams into #AccelScholars, a tight-knit community of the most ambitious, talented, kind-hearted people, whose individual stories we’ve been fortunate to support for the past eight years
6
39
166
@SeyoneC
Seyone Chithrananda
10 months
the IGI wrote a bit about our (in progress) work on building statistical tools for genome mining and discovery! check it out below ⬇️ 🔍
@ronboger
ron boger
10 months
A new IGI article delves into the story behind our method for statistically guaranteed genome mining and discovery of genes of unknown function. The piece offers insights into the journey and motivation driving our work! Read more here: https://t.co/rDmrw1lIY4
0
7
54
@KexinHuang5
Kexin Huang
10 months
🧪 Introducing POPPER: an AI agent that automates hypothesis validation by sequentially designing and executing falsification experiments with statistical rigor. 🔥POPPER matched PhD-level scientists on complex bio hypothesis validation - while reducing time by 10-fold! 🧵👇
25
227
1K
@CadeGordonML
Cade Gordon
10 months
Spent a good few hours and $50 wrangling with a few different implementations not yet finding success. Initial issues in compilation and now OOMs that would need me to reduce batch size or attempt some other tricks. I hope someone can give my code a whirl and have better luck!
0
0
0
@CadeGordonML
Cade Gordon
10 months
We have a similar trick available for writing out the log of harmax, which motivates a way that we can rewrite the equation into logits.
1
0
0
@CadeGordonML
Cade Gordon
10 months
Recall that we can compute the log of softmax with better precision by representing it as the logit minus the logsumexp of all logits. Jax has a great implementation of this https://t.co/GRhwsebL0c
1
0
0
@CadeGordonML
Cade Gordon
10 months
1. We'll keep everything in the training framework fixed except for the LM head. 2. As we're still using CE loss of some probability distribution at the end of the day, we can compare to the normal NanoGPT loss curves. 3. At the end we can try to eek out more performance in the
1
0
0
@CadeGordonML
Cade Gordon
10 months
Now we can start writing up an implementation. I'll list out my plans as this is where engineering choices might start leading to differences in performance. I invite the community to build from my fork and let me know any mistakes!
1
0
0
@CadeGordonML
Cade Gordon
10 months
Getting our hands a bit more dirty, we have the loss calculation. I think this is where the MLEs in the audience might have their eyes light up! We're generating our probabilities, taking the log of them, and sending the result off to the cross_entropy loss. Chaining together
1
0
0
@CadeGordonML
Cade Gordon
10 months
The authors share their original code for the experiment here: https://t.co/ILcMptkH9n Let's start with the computation of the logits. The logits can be written as a modified version of a linear layer with no bias. In particular the original implementation uses the fact that the
1
0
1
@CadeGordonML
Cade Gordon
10 months
The original paper shows some strong early results that for a GPT2 small based off of NanoGPT they get an improved training curve--albeit using a larger learning rate.
1
0
1