Cade Gordon @CadeGordonML X Profile

Cade Gordon

@CadeGordonML

Followers

2K

Following

3K

Media

35

Statuses

221

Helping models grow wise @Anthropic | Hertz Fellow | Prev: LAION-5B & OpenCLIP @UCBerkeley

https://t.co/LRuV3CYlxH

Berkeley, CA

Joined December 2020

Don't wanna be here? Send us removal request.

Cade Gordon

@CadeGordonML

1 year

Excited to announce our new work! 🧬 Some highlights are: - sequences likelihoods predict zero-shot fitness capabilities - a new method to calculate pLM likelihood in O(1) instead of O(L) forward passes - providing a causal between training data and outputs - suggesting a new

2

33

204

Cade Gordon

@CadeGordonML

20 days

Patient work, careful hands. See what grew.

Claude

@claudeai

20 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

5

1

86

Anthropic

@AnthropicAI

7 months

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

951

3K

21K

Cade Gordon

@CadeGordonML

7 months

Excited to share that I'll be joining @Anthropic to work on pretraining science! I've chosen to defer my Stanford PhD, where I'm honored to be supported by the Hertz Fellowship. There's something special about the science, this place, and these people. Looking forward to joining

42

10

770

Mike A. Merrill

@Mike_A_Merrill

7 months

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

16

66

244

Hertz Foundation

@HertzFoundation

7 months

👏 Meet the 2025 Hertz Fellows—19 rising leaders in science and tech advancing breakthroughs in robotics, energy, medicine & more. 🔗Learn more: https://t.co/RH9zCoCzoR

0

10

33

Hertz Foundation

@HertzFoundation

7 months

🎓🤖 We’re thrilled to welcome @CadeGordonML to the 2025 class of Hertz Fellows! Cade’s AI research is advancing biomedical discovery. A future PhD student at @Stanford, he joins a growing community shaping the future of #science and #tech! 🔗 https://t.co/6hKzo1t4Dd

2

29

Samarth Jajoo

@jajoosam

8 months

Documenting and sharing research in real-time is underrated in discussions about open science. @jainhiya_ and I think software can help change problem selection, collaboration, and funding. We write about how and why we should create real-time, open lab notebooks.

5

8

60

Hiya Jain

@jainhiya_

8 months

Chinese policy on clinical trial approvals liberalized massively in the mid 2010s. A decade later the effects of this move are perceptible in where our drugs come from.

1

3

17

Amy Lu

@amyxlu

8 months

Arrived in Singapore for ICLR—excited to see old & new friends! I’ll also be at the: - Thursday 3:30-5pm main conference poster session, presenting work led by @CadeGordonML on the subtleties of using protein LM likelihoods for fitness prediction (see 🔗👇) - GEM workshop

5

4

108

Amit Kumar

@amitku

9 months

A simple idea to build the @UCBerkeley startup alumni network has grown beyond my wildest dreams into #AccelScholars, a tight-knit community of the most ambitious, talented, kind-hearted people, whose individual stories we’ve been fortunate to support for the past eight years

6

39

166

Seyone Chithrananda

@SeyoneC

10 months

the IGI wrote a bit about our (in progress) work on building statistical tools for genome mining and discovery! check it out below ⬇️ 🔍

ron boger

@ronboger

10 months

A new IGI article delves into the story behind our method for statistically guaranteed genome mining and discovery of genes of unknown function. The piece offers insights into the journey and motivation driving our work! Read more here: https://t.co/rDmrw1lIY4

0

7

54

Kexin Huang

@KexinHuang5

10 months

🧪 Introducing POPPER: an AI agent that automates hypothesis validation by sequentially designing and executing falsification experiments with statistical rigor. 🔥POPPER matched PhD-level scientists on complex bio hypothesis validation - while reducing time by 10-fold! 🧵👇

25

227

1K

Cade Gordon

@CadeGordonML

10 months

Spent a good few hours and $50 wrangling with a few different implementations not yet finding success. Initial issues in compilation and now OOMs that would need me to reduce batch size or attempt some other tricks. I hope someone can give my code a whirl and have better luck!

0

Cade Gordon

@CadeGordonML

10 months

We have a similar trick available for writing out the log of harmax, which motivates a way that we can rewrite the equation into logits.

1

0

Cade Gordon

@CadeGordonML

10 months

Recall that we can compute the log of softmax with better precision by representing it as the logit minus the logsumexp of all logits. Jax has a great implementation of this https://t.co/GRhwsebL0c

1

0

Cade Gordon

@CadeGordonML

10 months

1. We'll keep everything in the training framework fixed except for the LM head. 2. As we're still using CE loss of some probability distribution at the end of the day, we can compare to the normal NanoGPT loss curves. 3. At the end we can try to eek out more performance in the

1

0

Cade Gordon

@CadeGordonML

10 months

Now we can start writing up an implementation. I'll list out my plans as this is where engineering choices might start leading to differences in performance. I invite the community to build from my fork and let me know any mistakes!

1

0

Cade Gordon

@CadeGordonML

10 months

Getting our hands a bit more dirty, we have the loss calculation. I think this is where the MLEs in the audience might have their eyes light up! We're generating our probabilities, taking the log of them, and sending the result off to the cross_entropy loss. Chaining together

1

0

Cade Gordon

@CadeGordonML

10 months

The authors share their original code for the experiment here: https://t.co/ILcMptkH9n Let's start with the computation of the logits. The logits can be written as a modified version of a linear layer with no bias. In particular the original implementation uses the fact that the

1

0

1

Cade Gordon

@CadeGordonML

10 months

The original paper shows some strong early results that for a GPT2 small based off of NanoGPT they get an improved training curve--albeit using a larger learning rate.

1

0

1