@GabriCorso
Gabriele Corso
5 months
Dear academic, biotech & drug discovery twitter colleagues, I need your help! I'm collecting a list of benchmarks & evaluation datasets for protein-small molecule affinity and virtual screening capacity (e.g. published hit discovery campaign results), which ones do you recommend?
11
11
76

Replies

@GabriCorso
Gabriele Corso
5 months
Examples include CSAR-HiQ, Merck FEP benchmark, CACHE #1 challenge... the more the merrier!
3
1
2
@GabriCorso
Gabriele Corso
5 months
Tagging a few people from whom I would love to have their opinion @david_koes @olexandr @CGorgulla @akshat_ai @jchodera @alshedivat 🙏
0
0
2
@alshedivat
Maruan Al-Shedivat
5 months
@GabriCorso openff protein-ligand benchmark has a pretty diverse set of targets (although # of ligands per target is small): . another one that comes to mind is merck fep, but you already mentioned it.
1
1
3
@GabriCorso
Gabriele Corso
5 months
@alshedivat Thanks Maruan! I actually was not aware of this one! Let me know if similar ones come to mind!
0
0
0
@_judewells
Jude Wells
5 months
@GabriCorso If you need solved structures: PDB Bind: DUDE: If you don't care about having the structure:
1
0
2
@GabriCorso
Gabriele Corso
5 months
@_judewells Yeah though from my experience these (without particular filterings) are all of somewhat bad quality and not very representative of what is actually useful in research/industry
0
0
2
@GabriCorso
Gabriele Corso
5 months
And connected to it, what is the right way of fairly evaluating methods on these (of course after blind prospective studies)? E.g. ensuring test proteins/pockets/ligands are never seen during training...
2
0
1
@GM_Randazzo
Giuseppe Marco (zeld) Randazzo
5 months
@GabriCorso Long story short. It depends on what you want to prove and achieve. Why not posebuster?
1
0
0
@GabriCorso
Gabriele Corso
5 months
@GM_Randazzo Posebusters is only structural as far as I know, mostly interested in affinity here!
1
0
0
@clemensisert
Clemens Isert
5 months
@GabriCorso Roche’s PDE10A dataset might be useful
1
1
3
@GabriCorso
Gabriele Corso
5 months
@clemensisert Thank you, yes this is very interesting (although I guess performance for a single target might be somewhat biased)!
0
0
1
@josejimlun
José Jiménez-Luna
5 months
@GabriCorso BindingDB protein-ligand validation sets. Old but contains lots of docked congeneric series data and some crystals.
1
1
0
@GabriCorso
Gabriele Corso
5 months
@josejimlun Thanks, I'll investigate!
0
0
0
@a_sarig_
Ahmet Sarıgün
5 months
@GabriCorso I've trıed a dataset with Molecular Mechanic features derived from a subset of the PDBBind database. However, I would advise examining some of these features more closely. Paper: Dataset:
1
0
0
@GabriCorso
Gabriele Corso
5 months
@a_sarig_ Thanks!
0
0
1
@4ndr3aR
Andrea 🤌🏾 Ranieri
5 months
@GabriCorso Hey there, just matching keywords here (I know almost nothing of the field), but could this be of some interest for you? It's from a few colleagues of mine, let me know if it may be relevant
0
0
0
@heyitsbasu
Sreejana Basu
5 months
@GabriCorso I may be able to collaborate and help you with a database- lets dm!
0
0
1
@dom_beaini
Dominique Beaini @ ICLR 2024
5 months
@GabriCorso I recommend getting in touch with @cas_wognum , he is at the center of a bio/pharma consortium for better benchmarks
0
0
5