Lood van Niekerk @lood_ml X Profile

Lood van Niekerk

@lood_ml

Followers

175

Following

2K

Media

5

Statuses

134

ML scientist at Ginkgo

Joined August 2019

Don't wanna be here? Send us removal request.

Lood van Niekerk

@lood_ml

12 days

Other sizes:.GDPx1: DRUG-seq of 1264 compounds in A549 human lung carcinoma epithelial cell line.GDPx2: DRUG-seq of 85 compounds in 4 Primary Cell Types.GDPa1: 246 antibodies over 10 biophysical assays. More info in the blogpost: (3/3).

0

1

Lood van Niekerk

@lood_ml

12 days

As an example, the GDPx3 cell painting dataset is just over 1 TB of images for 40 compounds in 3 primary cell types, plus metadata - excited for people to start digging into this. (2/3)

1

0

1

Lood van Niekerk

@lood_ml

19 days

Ever wondered why masked diffusion outperforms other types of discrete diffusion? (e.g for EvoDiff) .Alan figured out and then fixed it!.

Alan Amin

@AlanNawzadAmin

20 days

There are many domain-specific noise processes for discrete diffusion, but masking dominates! Why? We show masking exploits a key property of discrete diffusion, which we use to unlock the potential of those structured processes and beat masking! @gruver_nate @andrewgwils 1/7

0

4

Lood van Niekerk

@lood_ml

19 days

RT @kulesatony: University budgets everywhere are getting slashed, and we hear many PhD students with accepted ICML papers can no longer af….

0

34

0

Lood van Niekerk

@lood_ml

24 days

Definitely chat to Ruben and the team at their ICML poster session next month! And check out the attached thread for more info. (@PeterM_rchGroth, @yaringal, @deboramarks, @NotinPascal).(5/5). Paper:

0

1

Lood van Niekerk

@lood_ml

24 days

Really looking forward to this line of research being explored further! Credits to @ruben_weitzman and others for getting a challenging codebase up-and-running (incl GPU embedding vector databases, joint training of reader-retriever) (4/5).

1

0

1

Lood van Niekerk

@lood_ml

24 days

This work specifically looks at the question of: "Can we replace an MSA with an embedding search?".i.e. Let the model learn which sequence set to retrieve .(3/5).

1

0

1

Lood van Niekerk

@lood_ml

24 days

📚 There's a rich history of (differentiable) retrieval methods in NLP (REALM/RAG etc), and sequence database searches have helped boost performance of pLMs (MSA Transformer, Tranception, PoET and more mentioned in the paper) (2/5).

1

0

1

Lood van Niekerk

@lood_ml

24 days

As sequence databases get bigger and more diverse, retrieval-based methods provide an interesting alternative to scaling successively bigger protein language models. (1/5).

Ruben Weitzman

@ruben_weitzman

24 days

🚨ICML Paper Alert🚨.What if finding the right protein homologs wasn't a slow search, but a learned part of the model itself?.We introduce 𝐏𝐫𝐨𝐭𝐫𝐢𝐞𝐯𝐞𝐫, an end-to-end framework that learns to retrieve the most useful homologs for self-supervised reconstruction! (1/12)

1

2

17

Lood van Niekerk

@lood_ml

24 days

RT @KevinKaichuang: End-to-end differentiable homology search for protein fitness prediction. @ruben_weitzman @lood_ml @yaringal @debora….

0

14

0

Lood van Niekerk

@lood_ml

29 days

RT @GabriCorso: Excited to unveil Boltz-2, our new model capable not only of predicting structures but also binding affinities! Boltz-2 is….

0

418

0

Lood van Niekerk

@lood_ml

2 months

Excited to expand this further with bigger datasets!. They're presenting at the following times:.Thursday May 15.10:30-11:15AM.12:25-1:55PM.Poster session C .Poster hashtag #C044 . Feel free to reach out to datapoints@ginkgobioworks.com if you want to know more!. (4/4).

0

1

3

Lood van Niekerk

@lood_ml

2 months

We successfully guide towards these properties while maintaining high naturalness (likelihood) scores under other models such as AbLang2. We've found SVDD ( to work better than other discrete diffusion guidance approaches. (3/4).

1

3

Lood van Niekerk

@lood_ml

2 months

We guide antibodies towards low hydrophobicity and polyreactivity using simple oracles trained on 250 antibodies from our latest preprint ( (2/4).

1

2

Lood van Niekerk

@lood_ml

2 months

My colleagues Joshua Moller & Porfirio Quintero @Ginkgo are presenting our poster on guided discrete diffusion for antibody developability tomorrow at PEGS Boston 🧵 (1/4).

1

4

13

Lood van Niekerk

@lood_ml

2 months

New antibody developability dataset by the Datapoints team at Ginkgo - I hope this becomes a standard benchmark in the future. Congrats to the team who onboarded these assays and are running them consistently at ~thousands of variants per week. Preprint link in 🧵.

Jason Kelly

@jrkelly

2 months

Great new resource for anyone working on antibody CMC, developability, or discovery. New preprint out today from Ginkgo that benchmarks 246 therapeutic IgGs across 10 developability assays. The resulting dataset is ML-ready and publicly available. If you're thinking about early

0

5

Lood van Niekerk

@lood_ml

5 months

Right now we’re using computational proxies for naturalness/diversity/property optimization, compared to our genetic algorithm baseline and stability oracle which was experimentally validated. I’m excited to see where future projects go based on this!.

0

Lood van Niekerk

@lood_ml

5 months

It’s a nice demonstration of what we’ve been playing around with as a team (guided discrete diffusion), and it seems to preserve diversity/naturalness compared to a genetic algorithm baseline we’ve used previously for 3’UTR design.

1

0

Lood van Niekerk

@lood_ml

5 months

Congrats to @AlyssaKMorrow and others on this latest work: a model for designing full-length mRNA sequences for a given protein sequence 🧬.

Ginkgo Bioworks

@Ginkgo

5 months

It's launch day for mDD-0, the latest model from Ginkgo AI! Read the white paper about generative AI for full-length mRNA sequences including custom therapeutic payloads. We're excited to see what you'll build!.

1

10

Lood van Niekerk

@lood_ml

5 months

RT @etowah0: Can we learn protein biology from a language model?. In new work led by @liambai21 and me, we explore how sparse autoencoders….

0

101

0