Ruben Weitzman Profile
Ruben Weitzman

@ruben_weitzman

Followers
177
Following
111
Media
5
Statuses
20

PhD researcher in CS @OATML_Oxford/@DeboraMarksLab | ML for computational biology

Oxford, UK
Joined September 2022
Don't wanna be here? Send us removal request.
@ruben_weitzman
Ruben Weitzman
27 days
🚨ICML Paper Alert🚨.What if finding the right protein homologs wasn't a slow search, but a learned part of the model itself?.We introduce 𝐏𝐫𝐨𝐭𝐫𝐒𝐞𝐯𝐞𝐫, an end-to-end framework that learns to retrieve the most useful homologs for self-supervised reconstruction! (1/12)
Tweet media one
5
21
99
@ruben_weitzman
Ruben Weitzman
27 days
Check out the full paper, @NotinPascal intro blogpost, and follow for future code release at:.paper: code: blogpost: (12/12).
0
0
10
@ruben_weitzman
Ruben Weitzman
27 days
Huge thanks to the team for this great collaboration between @DeboraMarksLab and @OATML_Oxford with @NotinPascal @PeterM_rchGroth @lood_ml @deboramarks @yaringal. (11/12).
1
1
7
@ruben_weitzman
Ruben Weitzman
27 days
We used the powerful architectures PoET from Bepler et Truong as the reader model and ESM2 from @alexrives @ebetica as the Retriever. Thanks for providing such great, easy-to-use open models!.Thanks to the team at @meta, @gizacard for the Atlas model inspiration. (10/12).
1
0
6
@ruben_weitzman
Ruben Weitzman
27 days
Our vector framework is also uniquely flexible, leveraging FAISS indexing. The entire UniRef50 database becomes a single ~13GB file Index. You can easily add new or proprietary sequences to the index just by embedding themβ€”no retraining of the full model required. (9/12).
1
0
4
@ruben_weitzman
Ruben Weitzman
27 days
A crucial detail: the starting point for this training matters. ESM-2's default embeddings are a poor start for retrieval. We found initializing with a pretrained dense passage retriever creates a much better foundation for our final end-to-end learning. (8/12)
Tweet media one
1
0
3
@ruben_weitzman
Ruben Weitzman
27 days
The speedup is dramatic. By replacing slow alignment with learned vector search, Protriever is two orders of magnitude faster than MMseqs2-GPU. This efficiency comes with superior predictive accuracy too in our end-to-end framework. (7/12)
Tweet media one
1
0
5
@ruben_weitzman
Ruben Weitzman
27 days
This joint training leads to new state-of-the-art results. On the ProteinGym benchmark, Protriever is the top-performing sequence-based model, achieving a Spearman correlation of 0.479 and outperforming baselines across all metrics. (6/12)
Tweet media one
1
0
4
@ruben_weitzman
Ruben Weitzman
27 days
The key is our 𝐞𝐧𝐝-𝐭𝐨-𝐞𝐧𝐝 𝐭𝐫𝐚𝐒𝐧𝐒𝐧𝐠. The Reader's performance on its self-supervised task - reconstructing the query protein from its homologs - provides a direct learning signal to the Retriever. This teaches the Retriever which homologs are truly relevant. (5/12).
1
0
6
@ruben_weitzman
Ruben Weitzman
27 days
Protriever contains two core modules: The π‘πžπ­π«π’πžπ―πžπ« embeds proteins into a vector space for fast similarity search. The π‘πžπšππžπ« then uses these retrieved homologs as context. An 𝐈𝐧𝐝𝐞𝐱 represents all Uniref50 sequences, the reader pulls from. (4/12).
1
0
5
@ruben_weitzman
Ruben Weitzman
27 days
But finding the 𝐫𝐒𝐠𝐑𝐭 homologs is a major bottleneck. Traditional methods are slow & operate independently of downstream models. Inspired by retrieval-augmented NLP models, we propose a unified end-to-end differentiable process. (3/12).
1
0
6
@ruben_weitzman
Ruben Weitzman
27 days
Accurately predicting a protein's fitness landscape is critical for pathogenicity prediction and designing new drugs. The most powerful models rely on homologs - related sequences - to reveal which mutations are permissible, as shown in the ProteinGym benchmark (2/12)
Tweet media one
1
0
8
@ruben_weitzman
Ruben Weitzman
1 year
RT @H__Spinner: Marks Lab is HIRING!!! If you're a software engineer interested in biology, proteins, RNAs, viruses, genomes, etc etc and M….
0
13
0
@ruben_weitzman
Ruben Weitzman
1 year
RT @NatureBiotech: The February issue, with a focus on protein engineering, is live Our cover shows the three data….
0
94
0
@ruben_weitzman
Ruben Weitzman
2 years
RT @NotinPascal: 🚨The new version of ProteinGym is out!🚨.#NeurIPS2023 #GenBio #Proteins #Benchmark #MAVE https://t.….
0
39
0
@ruben_weitzman
Ruben Weitzman
2 years
RT @NotinPascal: πŸ“’Very pleased to be presenting our ProteinNPT paper at NeurIPS on Tuesday!πŸ“’We introduce a novel semi-supervised conditiona….
0
26
0
@ruben_weitzman
Ruben Weitzman
2 years
RT @roseorenbuch: Announcing popEVE - a deep generative model of the human proteome that reveal over a hundred novel genes involved in rare….
0
22
0