_judewells Profile Banner
Jude Wells Profile
Jude Wells

@_judewells

Followers
965
Following
2K
Media
72
Statuses
277

PhD student University College London: machine learning methods for structure-based drug discovery.

London
Joined March 2009
Don't wanna be here? Send us removal request.
@_judewells
Jude Wells
2 days
I'll be at the @genbio_workshop at #ICML2025 today, presenting my poster on Plixer, a generative model for drug-like molecules - given a target protein pocket, Plixer generates a voxel (3D pixel) hypothesis for the binding ligand before decoding it into a valid SMILES string 1/n
Tweet media one
Tweet media two
1
1
7
@_judewells
Jude Wells
2 days
Check out the code and preprint here:.
1
0
1
@_judewells
Jude Wells
2 days
because I believe there is an inductive bias in the convolution operation of a CNN, stemming from the fact that molecular recognition is a phenomenon driven by local interactions.
1
0
0
@_judewells
Jude Wells
2 days
Much of the recent work on protein-conditioned generative models for small molecules has focused on diffusion approaches that move discrete atoms with coordinates in 3D space, I have a hunch that convolutions in 3D pixel space might be an easier supervised learning task. .
1
0
0
@_judewells
Jude Wells
2 days
Using these databases, we can cheaply generate training pairs of 3D conformers and their corresponding SMILES string, we can use this to pretrain a voxel-to-smiles decoder, which improves our ability to sample valid and diverse drug-like molecules.
1
0
0
@_judewells
Jude Wells
2 days
The small data sample of protein-ligand complexes in the PDB is not enough to learn the chemical grammar and valid conformational space of molecules as 3D objects. But we do have access to massive databases of 100s of millions of ligand-only samples in databases such as ZINC.
1
0
0
@_judewells
Jude Wells
2 days
Generative models for protein-conditioned small molecule generation suffer from a lack of training data with only a few thousand non-redundant experimentally resolved structures in the PDB, this is not enough to cover the diversity of drug-like molecule space 2/n.
1
0
0
@_judewells
Jude Wells
3 days
RT @dmmiller597: Come and join me, @_judewells, @_barneyhill, @jakublala and others in London on July 31st to hear some great flash talks o….
0
4
0
@_judewells
Jude Wells
1 month
Nice.
@SciTechgovuk
Department for Science, Innovation and Technology
1 month
We're backing the 'OpenBind' project, a new AI initiative announced at #LondonTechWeek that will build the world's biggest protein dataset right here in the UK. That means faster drug discoveries, new treatments for diseases like cancer, and cutting the cost of getting
Tweet media one
0
0
2
@_judewells
Jude Wells
2 months
a lotta yall still dont get it.if you take a Lorentzian metric as a section of a bundle of point-wise Lorentz metrics and pull back the positive Weyl spinors with a suitable passage to the maximal compact subgroup you get one generation of Pati–Salam grand unified fermions.
3
0
22
@_judewells
Jude Wells
3 months
Tweet media one
0
0
8
@_judewells
Jude Wells
3 months
The ProGen authors do a nice job of showing the scaling laws for PLMs, but contrary to trends in NLP where scaling and lower perplexity usually yield better performance in downstream tasks we don't see this in PLMs so what's different about proteins?.
3
0
20
@_judewells
Jude Wells
3 months
Lack of benefit to scaling PLMs particularly for fitness prediction (eg. proteinGym) is well known, the authors cite this paper:.and similarly it seems to be the case that smaller ESM models often do better on a wide variety of prediction tasks.
2
0
14
@_judewells
Jude Wells
3 months
they also find that expression levels for both novel proteins (<30% seq id with naturals) and natural-like (>30% seq id) are similar for different model sizes although larger models have slightly better expression for a specific sequence in-filling task.
1
0
5
@_judewells
Jude Wells
3 months
in addition to zero-shot fitness prediction (blue) the authors also do fine-tuning (orange) where you use some of the measured assay results to tune model preferences. Here there seems to be some evidence in favour of scaling, but it doesn't look conclusive
Tweet media one
2
0
9
@_judewells
Jude Wells
3 months
but what I find more interesting are the findings where scale doesn't seem to help: For the proteinGym benchmark, sequence likelihood is used to predict the fitness score across various assays and proteins, here there seems to be no benefit scaling beyond ~300m params
Tweet media one
3
1
19
@_judewells
Jude Wells
3 months
They also show that the larger models generate sequences that pass filtering criteria (such as not having too many repeated AAs etc.)
Tweet media one
2
0
9
@_judewells
Jude Wells
3 months
the paper shows that if you generate sequences from the model and align the generations with natural proteins your generations will cover around 50% more sequence clusters with a model that is 135x larger (339m -> 46bn).
Tweet media one
2
0
11
@_judewells
Jude Wells
3 months
The progen models are a series of auto-regressive protein language models they train a series of model sizes all the way up to 46bn params, paper is here:.First, we have to acknowledge that they do find some benefits to scaling. .
1
1
16
@_judewells
Jude Wells
3 months
I really like this ProGen3 paper because, contrary to the title, I think it actually shows there is relatively little to be gained from massively scaling protein language models. 1/n
Tweet media one
3
50
304