Dear academic, biotech & drug discovery twitter colleagues, I need your help!
I'm collecting a list of benchmarks & evaluation datasets for protein-small molecule affinity and virtual screening capacity (e.g. published hit discovery campaign results), which ones do you recommend?
@GabriCorso
openff protein-ligand benchmark has a pretty diverse set of targets (although # of ligands per target is small): . another one that comes to mind is merck fep, but you already mentioned it.
@_judewells
Yeah though from my experience these (without particular filterings) are all of somewhat bad quality and not very representative of what is actually useful in research/industry
And connected to it, what is the right way of fairly evaluating methods on these (of course after blind prospective studies)? E.g. ensuring test proteins/pockets/ligands are never seen during training...
@GabriCorso
I've trıed a dataset with Molecular Mechanic features derived from a subset of the PDBBind database. However, I would advise examining some of these features more closely.
Paper:
Dataset:
@GabriCorso
Hey there, just matching keywords here (I know almost nothing of the field), but could this be of some interest for you?
It's from a few colleagues of mine, let me know if it may be relevant