Lood van Niekerk Profile
Lood van Niekerk

@lood_ml

Followers
175
Following
2K
Media
5
Statuses
134

ML scientist at Ginkgo

Joined August 2019
Don't wanna be here? Send us removal request.
@lood_ml
Lood van Niekerk
12 days
Other sizes:.GDPx1: DRUG-seq of 1264 compounds in A549 human lung carcinoma epithelial cell line.GDPx2: DRUG-seq of 85 compounds in 4 Primary Cell Types.GDPa1: 246 antibodies over 10 biophysical assays. More info in the blogpost: (3/3).
0
0
1
@lood_ml
Lood van Niekerk
12 days
As an example, the GDPx3 cell painting dataset is just over 1 TB of images for 40 compounds in 3 primary cell types, plus metadata - excited for people to start digging into this. (2/3)
Tweet media one
1
0
1
@lood_ml
Lood van Niekerk
19 days
Ever wondered why masked diffusion outperforms other types of discrete diffusion? (e.g for EvoDiff) .Alan figured out and then fixed it!.
@AlanNawzadAmin
Alan Amin
20 days
There are many domain-specific noise processes for discrete diffusion, but masking dominates! Why? We show masking exploits a key property of discrete diffusion, which we use to unlock the potential of those structured processes and beat masking! @gruver_nate @andrewgwils 1/7
Tweet media one
0
0
4
@lood_ml
Lood van Niekerk
19 days
RT @kulesatony: University budgets everywhere are getting slashed, and we hear many PhD students with accepted ICML papers can no longer af….
0
34
0
@lood_ml
Lood van Niekerk
24 days
Definitely chat to Ruben and the team at their ICML poster session next month! And check out the attached thread for more info. (@PeterM_rchGroth, @yaringal, @deboramarks, @NotinPascal).(5/5). Paper:
0
0
1
@lood_ml
Lood van Niekerk
24 days
Really looking forward to this line of research being explored further! Credits to @ruben_weitzman and others for getting a challenging codebase up-and-running (incl GPU embedding vector databases, joint training of reader-retriever) (4/5).
1
0
1
@lood_ml
Lood van Niekerk
24 days
This work specifically looks at the question of: "Can we replace an MSA with an embedding search?".i.e. Let the model learn which sequence set to retrieve .(3/5).
1
0
1
@lood_ml
Lood van Niekerk
24 days
📚 There's a rich history of (differentiable) retrieval methods in NLP (REALM/RAG etc), and sequence database searches have helped boost performance of pLMs (MSA Transformer, Tranception, PoET and more mentioned in the paper) (2/5).
1
0
1
@lood_ml
Lood van Niekerk
24 days
As sequence databases get bigger and more diverse, retrieval-based methods provide an interesting alternative to scaling successively bigger protein language models. (1/5).
@ruben_weitzman
Ruben Weitzman
24 days
🚨ICML Paper Alert🚨.What if finding the right protein homologs wasn't a slow search, but a learned part of the model itself?.We introduce 𝐏𝐫𝐨𝐭𝐫𝐢𝐞𝐯𝐞𝐫, an end-to-end framework that learns to retrieve the most useful homologs for self-supervised reconstruction! (1/12)
Tweet media one
1
2
17
@lood_ml
Lood van Niekerk
24 days
RT @KevinKaichuang: End-to-end differentiable homology search for protein fitness prediction. @ruben_weitzman @lood_ml @yaringal @debora….
0
14
0
@lood_ml
Lood van Niekerk
29 days
RT @GabriCorso: Excited to unveil Boltz-2, our new model capable not only of predicting structures but also binding affinities! Boltz-2 is….
0
418
0
@lood_ml
Lood van Niekerk
2 months
Excited to expand this further with bigger datasets!. They're presenting at the following times:.Thursday May 15.10:30-11:15AM.12:25-1:55PM.Poster session C .Poster hashtag #C044 . Feel free to reach out to datapoints@ginkgobioworks.com if you want to know more!. (4/4).
0
1
3
@lood_ml
Lood van Niekerk
2 months
We successfully guide towards these properties while maintaining high naturalness (likelihood) scores under other models such as AbLang2. We've found SVDD ( to work better than other discrete diffusion guidance approaches. (3/4).
1
1
3
@lood_ml
Lood van Niekerk
2 months
We guide antibodies towards low hydrophobicity and polyreactivity using simple oracles trained on 250 antibodies from our latest preprint ( (2/4).
1
1
2
@lood_ml
Lood van Niekerk
2 months
My colleagues Joshua Moller & Porfirio Quintero @Ginkgo are presenting our poster on guided discrete diffusion for antibody developability tomorrow at PEGS Boston 🧵 (1/4).
1
4
13
@lood_ml
Lood van Niekerk
2 months
New antibody developability dataset by the Datapoints team at Ginkgo - I hope this becomes a standard benchmark in the future. Congrats to the team who onboarded these assays and are running them consistently at ~thousands of variants per week. Preprint link in 🧵.
@jrkelly
Jason Kelly
2 months
Great new resource for anyone working on antibody CMC, developability, or discovery. New preprint out today from Ginkgo that benchmarks 246 therapeutic IgGs across 10 developability assays. The resulting dataset is ML-ready and publicly available. If you're thinking about early
Tweet media one
0
0
5
@lood_ml
Lood van Niekerk
5 months
Right now we’re using computational proxies for naturalness/diversity/property optimization, compared to our genetic algorithm baseline and stability oracle which was experimentally validated. I’m excited to see where future projects go based on this!.
0
0
0
@lood_ml
Lood van Niekerk
5 months
It’s a nice demonstration of what we’ve been playing around with as a team (guided discrete diffusion), and it seems to preserve diversity/naturalness compared to a genetic algorithm baseline we’ve used previously for 3’UTR design.
1
0
0
@lood_ml
Lood van Niekerk
5 months
Congrats to @AlyssaKMorrow and others on this latest work: a model for designing full-length mRNA sequences for a given protein sequence 🧬.
@Ginkgo
Ginkgo Bioworks
5 months
It's launch day for mDD-0, the latest model from Ginkgo AI! Read the white paper about generative AI for full-length mRNA sequences including custom therapeutic payloads. We're excited to see what you'll build!.
Tweet media one
1
1
10
@lood_ml
Lood van Niekerk
5 months
RT @etowah0: Can we learn protein biology from a language model?. In new work led by @liambai21 and me, we explore how sparse autoencoders….
0
101
0