Jeff Ruffolo @jeffruffolo X Profile

Jeff Ruffolo

@jeffruffolo

Followers

2K

Following

239

Media

44

Statuses

114

Protein Design / ML @ProfluentBio | Molecular Biophysics PhD @JohnsHopkins

Berkeley, CA

Joined November 2014

Don't wanna be here? Send us removal request.

Jeff Ruffolo

@jeffruffolo

3 months

What does pushing the boundaries of model capacity and data scale do for generative protein language models? I’m super excited to share our latest work @ProfluentBio where we begin to explore and test some of our hypotheses!

2

24

136

Jeff Ruffolo

@jeffruffolo

3 months

RT @thisismadani: What could scaling unlock for biology?. Introducing ProGen3- our next AI foundation models for protein generation. We dev….

0

62

0

Jeff Ruffolo

@jeffruffolo

3 months

Learn more, use our models, or work directly with us!. Blog: Github: Platform access:

docs.google.com

Profluent's foundation models enable us to design all of life's large molecules. We're providing a limited early access program to our best models. These partners will get access to our frontier...

0

4

Jeff Ruffolo

@jeffruffolo

3 months

We’re incredibly optimistic about the opportunities to solve important, hard problems in protein design by scaling up our models and data. We’ve already ~10x our data scale since training ProGen3, so this really is just the beginning.

1

0

3

Jeff Ruffolo

@jeffruffolo

3 months

Not only do we see compelling benchmark performance, but also that these aligned capabilities extend to generative settings, which is what really matters for design. Meaning, with just a bit of data we can steer the models to generate the high-fitness sequences we want.

1

0

2

Jeff Ruffolo

@jeffruffolo

3 months

Coming back to fitness prediction, we wanted to see if this greater understanding of protein sequence space translated to stronger predictive power. We turned to alignment, where we use a bit of experimental data to tilt the model towards properties we care about, like stability.

1

0

2

Jeff Ruffolo

@jeffruffolo

3 months

We think this is the beginning of a new, more meaningful way of understanding what it means to scale protein language models, going beyond ranking of mutations or predicting structural contacts. This will be incredibly useful in shaping how we apply models like ProGen3.

2

0

4

Jeff Ruffolo

@jeffruffolo

3 months

This extends even to proteins that had low (or no) homology to anything in the models’ training data, where we still see comparable rates of protein expression, including for proteins with very low AlphaFold2 pLDDT.

1

5

Jeff Ruffolo

@jeffruffolo

3 months

To put this to the test, we experimentally tested the viability (expression) of hundreds of proteins in the lab, and found that this added diversity is real. Generated proteins are as viable as natural proteins, and larger models can come up with more and more of them.

1

0

3

Jeff Ruffolo

@jeffruffolo

3 months

So what should we be evaluating? Generative models like ProGen3 are fundamentally trained to generate proteins. So we just let the models generate! We found that as models scale, not only do they generate higher quality sequences, but also produce considerably more diversity.

1

0

3

Jeff Ruffolo

@jeffruffolo

3 months

But why do all of this? What does scaling get us? ProteinGym is a nice benchmark for measuring zero-shot fitness prediction, but even three years ago (ProGen2) we found that this wasn’t the best proxy for evaluating scaling, and we still find that to be the case.

2

0

3

Jeff Ruffolo

@jeffruffolo

3 months

We developed optimal scaling laws that allowed us to scale up to 46B parameters, where we continue to see signs of generalization on diverse proteins far from the training data.

1

0

4

Jeff Ruffolo

@jeffruffolo

3 months

ProGen3 is a family of MoE models ranging from 112M to 46B parameters, capable of full sequence generation, as well as infilling. For practical protein design problems, having these new capabilities opens up a lot of new possibilities.

1

0

2

Jeff Ruffolo

@jeffruffolo

3 months

All the details are in our preprint, but I’ll summarize the main findings below.

biorxiv.org

Generative protein language models (PLMs) are powerful tools for designing proteins purpose-built to solve problems in medicine, agriculture, and industrial processes. Recent work has trained ever...

1

2

4

Jeff Ruffolo

@jeffruffolo

3 months

RT @ProfluentBio: What if the same AI advancements that have transformed ChatGPT could be replicated in biology?. Enter ProGen3, our latest….

0

12

0

Jeff Ruffolo

@jeffruffolo

7 months

RT @ProfluentBio: 1/ Today we announced new research in the ability of AI models to precisely modulate protein-DNA interactions without ite….

0

14

0

Jeff Ruffolo

@jeffruffolo

8 months

I’ll be in Vancouver for NeurIPS December 13-16, reach out if you’re interested in protein language models, genome editor / antibody design, or any of the other cool stuff we’re doing @ProfluentBio!.

0

1

20

Jeff Ruffolo

@jeffruffolo

10 months

RT @KevinKaichuang: Finetuned protein language models for conditional generation of enzymes with desired functions and taxonomies. @jsunn….

0

6

0

Jeff Ruffolo

@jeffruffolo

10 months

RT @jsunn_y: Excited to share my summer internship project @ProfluentBio! We used adapters to finetune protein language models for conditio….

0

14

0

Jeff Ruffolo

@jeffruffolo

1 year

RT @xiaofei_lin: The "ChatGPT moment" for biology proceeds to unfold as @ProfluentBio announces proseLM, a new method which incorporates st….

genengnews.com

Profluent’s latest protein language model incorporates structural and functional context to decode the language of biology for fine-tuned protein design.

0

42

0