Jude Wells @_judewells X Profile

Jude Wells

@_judewells

Followers

965

Following

2K

Media

72

Statuses

277

PhD student University College London: machine learning methods for structure-based drug discovery.

London

Joined March 2009

Don't wanna be here? Send us removal request.

Jude Wells

@_judewells

2 days

I'll be at the @genbio_workshop at #ICML2025 today, presenting my poster on Plixer, a generative model for drug-like molecules - given a target protein pocket, Plixer generates a voxel (3D pixel) hypothesis for the binding ligand before decoding it into a valid SMILES string 1/n

1

7

Jude Wells

@_judewells

2 days

Check out the code and preprint here:.

1

0

1

Jude Wells

@_judewells

2 days

because I believe there is an inductive bias in the convolution operation of a CNN, stemming from the fact that molecular recognition is a phenomenon driven by local interactions.

1

0

Jude Wells

@_judewells

2 days

Much of the recent work on protein-conditioned generative models for small molecules has focused on diffusion approaches that move discrete atoms with coordinates in 3D space, I have a hunch that convolutions in 3D pixel space might be an easier supervised learning task. .

1

0

Jude Wells

@_judewells

2 days

Using these databases, we can cheaply generate training pairs of 3D conformers and their corresponding SMILES string, we can use this to pretrain a voxel-to-smiles decoder, which improves our ability to sample valid and diverse drug-like molecules.

1

0

Jude Wells

@_judewells

2 days

The small data sample of protein-ligand complexes in the PDB is not enough to learn the chemical grammar and valid conformational space of molecules as 3D objects. But we do have access to massive databases of 100s of millions of ligand-only samples in databases such as ZINC.

1

0

Jude Wells

@_judewells

2 days

Generative models for protein-conditioned small molecule generation suffer from a lack of training data with only a few thousand non-redundant experimentally resolved structures in the PDB, this is not enough to cover the diversity of drug-like molecule space 2/n.

1

0

Jude Wells

@_judewells

3 days

RT @dmmiller597: Come and join me, @_judewells, @_barneyhill, @jakublala and others in London on July 31st to hear some great flash talks o….

0

4

0

Jude Wells

@_judewells

1 month

Nice.

Department for Science, Innovation and Technology

@SciTechgovuk

1 month

We're backing the 'OpenBind' project, a new AI initiative announced at #LondonTechWeek that will build the world's biggest protein dataset right here in the UK. That means faster drug discoveries, new treatments for diseases like cancer, and cutting the cost of getting

0

2

Jude Wells

@_judewells

2 months

a lotta yall still dont get it.if you take a Lorentzian metric as a section of a bundle of point-wise Lorentz metrics and pull back the positive Weyl spinors with a suitable passage to the maximal compact subgroup you get one generation of Pati–Salam grand unified fermions.

3

0

22

Jude Wells

@_judewells

3 months

0

8

Jude Wells

@_judewells

3 months

The ProGen authors do a nice job of showing the scaling laws for PLMs, but contrary to trends in NLP where scaling and lower perplexity usually yield better performance in downstream tasks we don't see this in PLMs so what's different about proteins?.

3

0

20

Jude Wells

@_judewells

3 months

Lack of benefit to scaling PLMs particularly for fitness prediction (eg. proteinGym) is well known, the authors cite this paper:.and similarly it seems to be the case that smaller ESM models often do better on a wide variety of prediction tasks.

2

0

14

Jude Wells

@_judewells

3 months

they also find that expression levels for both novel proteins (<30% seq id with naturals) and natural-like (>30% seq id) are similar for different model sizes although larger models have slightly better expression for a specific sequence in-filling task.

1

0

5

Jude Wells

@_judewells

3 months

in addition to zero-shot fitness prediction (blue) the authors also do fine-tuning (orange) where you use some of the measured assay results to tune model preferences. Here there seems to be some evidence in favour of scaling, but it doesn't look conclusive

2

0

9

Jude Wells

@_judewells

3 months

but what I find more interesting are the findings where scale doesn't seem to help: For the proteinGym benchmark, sequence likelihood is used to predict the fitness score across various assays and proteins, here there seems to be no benefit scaling beyond ~300m params

3

1

19

Jude Wells

@_judewells

3 months

They also show that the larger models generate sequences that pass filtering criteria (such as not having too many repeated AAs etc.)

2

0

9

Jude Wells

@_judewells

3 months

the paper shows that if you generate sequences from the model and align the generations with natural proteins your generations will cover around 50% more sequence clusters with a model that is 135x larger (339m -> 46bn).

2

0

11

Jude Wells

@_judewells

3 months

The progen models are a series of auto-regressive protein language models they train a series of model sizes all the way up to 46bn params, paper is here:.First, we have to acknowledge that they do find some benefits to scaling. .

1

16

Jude Wells

@_judewells

3 months

I really like this ProGen3 paper because, contrary to the title, I think it actually shows there is relatively little to be gained from massively scaling protein language models. 1/n

3

50

304