jeffruffolo Profile Banner
Jeff Ruffolo Profile
Jeff Ruffolo

@jeffruffolo

Followers
3K
Following
248
Media
44
Statuses
119

Protein Design / ML @ProfluentBio | Molecular Biophysics PhD @JohnsHopkins

Berkeley, CA
Joined November 2014
Don't wanna be here? Send us removal request.
@jeffruffolo
Jeff Ruffolo
7 months
What does pushing the boundaries of model capacity and data scale do for generative protein language models? I’m super excited to share our latest work @ProfluentBio where we begin to explore and test some of our hypotheses!
2
24
136
@nathanbenaich
Nathan Benaich
1 month
Biology gets its scaling laws too. @ProfluentBio's ProGen3 trained on 1.5T tokens and created a compute frontier for protein language models. This is unlocking generalisation in novel protein space and a path to novel therapeutics such as custom gene editors.
1
3
17
@ProfluentBio
Profluent
1 month
We’re excited to announce a multi-year partnership between Profluent Bio and @Corteva Agriscience to accelerate sustainable, AI-powered crop innovation. 🌱🤝 Together, we aim to unlock new possibilities for developing resilient crops, improving resource efficiency, and advancing
0
2
14
@ProfluentBio
Profluent
2 months
We are excited to announce that our work on ProGen3 has been accepted as a Spotlight paper at NeurIPS 2025 – the premier AI research conference. This year NeurIPS received 21,575 valid submissions, of which only 3.2% earned a Spotlight distinction. We’re thrilled that ProGen3 is
1
1
7
@RevvityLifeSci
Revvity for Life Sciences
3 months
Democratized access to CRISPR just got a major boost! @ProfluentBio’s OpenCRISPR-1 AI-created, open-access alternative to Cas9, is now peer-reviewed in Nature. And we’ve already shown it works with our licensable Pin-point™ base editing system. Read our blog:
0
1
1
@ProfluentBio
Profluent
3 months
We’re excited to share new data published in @Nature detailing the impressive activity, specificity, and low immunogenicity of our AI-designed CRISPR-Cas proteins, including OpenCRISPR-1. The future of gene editing is here and we’re scaling our capabilities to tackle the
4
59
267
@thisismadani
Ali Madani
7 months
What could scaling unlock for biology? Introducing ProGen3- our next AI foundation models for protein generation. We develop compute-optimal scaling laws up to 46B parameters on 1.5T tokens with real evidence in the wet lab. +we solve a new set of challenges for drug discovery
20
62
357
@jeffruffolo
Jeff Ruffolo
7 months
We’re incredibly optimistic about the opportunities to solve important, hard problems in protein design by scaling up our models and data. We’ve already ~10x our data scale since training ProGen3, so this really is just the beginning.
1
0
3
@jeffruffolo
Jeff Ruffolo
7 months
Not only do we see compelling benchmark performance, but also that these aligned capabilities extend to generative settings, which is what really matters for design. Meaning, with just a bit of data we can steer the models to generate the high-fitness sequences we want.
1
0
2
@jeffruffolo
Jeff Ruffolo
7 months
Coming back to fitness prediction, we wanted to see if this greater understanding of protein sequence space translated to stronger predictive power. We turned to alignment, where we use a bit of experimental data to tilt the model towards properties we care about, like stability.
1
0
2
@jeffruffolo
Jeff Ruffolo
7 months
We think this is the beginning of a new, more meaningful way of understanding what it means to scale protein language models, going beyond ranking of mutations or predicting structural contacts. This will be incredibly useful in shaping how we apply models like ProGen3.
2
0
4
@jeffruffolo
Jeff Ruffolo
7 months
This extends even to proteins that had low (or no) homology to anything in the models’ training data, where we still see comparable rates of protein expression, including for proteins with very low AlphaFold2 pLDDT.
1
1
5
@jeffruffolo
Jeff Ruffolo
7 months
To put this to the test, we experimentally tested the viability (expression) of hundreds of proteins in the lab, and found that this added diversity is real. Generated proteins are as viable as natural proteins, and larger models can come up with more and more of them.
1
0
3
@jeffruffolo
Jeff Ruffolo
7 months
So what should we be evaluating? Generative models like ProGen3 are fundamentally trained to generate proteins. So we just let the models generate! We found that as models scale, not only do they generate higher quality sequences, but also produce considerably more diversity.
1
0
3
@jeffruffolo
Jeff Ruffolo
7 months
But why do all of this? What does scaling get us? ProteinGym is a nice benchmark for measuring zero-shot fitness prediction, but even three years ago (ProGen2) we found that this wasn’t the best proxy for evaluating scaling, and we still find that to be the case.
2
0
3
@jeffruffolo
Jeff Ruffolo
7 months
We developed optimal scaling laws that allowed us to scale up to 46B parameters, where we continue to see signs of generalization on diverse proteins far from the training data.
1
0
4
@jeffruffolo
Jeff Ruffolo
7 months
ProGen3 is a family of MoE models ranging from 112M to 46B parameters, capable of full sequence generation, as well as infilling. For practical protein design problems, having these new capabilities opens up a lot of new possibilities.
1
0
2
@ProfluentBio
Profluent
7 months
What if the same AI advancements that have transformed ChatGPT could be replicated in biology? Enter ProGen3, our latest foundation model suite for protein generation.
6
12
81