Jonathan Pilault
@J_Pilault
Followers
327
Following
11
Media
4
Statuses
106
• ML Research Scientist at Silicon Valley startup @ZyphraAI • Former researcher @GoogleDeepMind @nvidia • PhD @Mila_Quebec
Montreal
Joined February 2010
1) RAG often struggles on complex multi-hop queries. In this blog, we @ZyphraAI discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost. https://t.co/QDXUdiWzh5
1
4
12
@ylecun Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:
0
2
25
0
0
8
By using the two-level interconnect topology on GPU clusters, Tree Attention allows for asymptotically faster decoding as we scale output sequence length and number of GPUs in a cluster and lower peak memory requirements:
1
0
7
Unlike Ring Attention's P2P communication that scales with sequence length, Tree Attention uses Allreduce that • do not scale communication volume with sequence length • reduce internode communication requirements • allow better overlap with single-device attention computation
1
0
8
Tree attention was derived from the scalar energy function interpretation of self-attention that reveals that a tree reduction can be performed across the sequence axis due to the associative properties of the logsumexp and max operations.
1
0
17
Zyphra is proud to release Tree Attention, a fast inference method for extremely large sequence lengths • 8x faster inference speed vs. Ring Attention • 2x less peak memory • low data communication volumes Paper: https://t.co/yf5VNRze6W Code: https://t.co/Th6Fg8eFEr A 🧵
1
31
152
Zyphra is ecstatic to release Zamba2-small: - 2.7B Mamba2/Attention hybrid - Pre-trained on 3T tokens + annealed on 100B high-quality tokens - Model released on HuggingFace and standalone PyTorch - SOTA evaluation performance and superior inference efficiency.
4
45
203
Hyped to share JaxPruner: a concise library for sparsity research. JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7 Code: https://t.co/tPwCL03xnE Paper: https://t.co/eedLJj5EVW
1
31
148
Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128
23
81
427
Last week, I gave a talk at @Mila_Quebec. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with @MahanFathi @ClementGehring @J_Pilault @davidkanaa @pierrelux. See you at @iclr_conf in 🇦🇹! https://t.co/vFBtHDzNju
drive.google.com
0
5
18
Course Correcting Koopman Representations Accepted at #ICLR2024! We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 📄 https://t.co/ULNzqAV3bB
@GoogleDeepMind 1/🧵
4
19
93
My research group @kasl_ai is looking for interns! Applications are due in 2 weeks ***January 29***. The long-awaited form: https://t.co/hLOjuxSfnK Please share widely!!
6
74
276
@notnotrishi I like the SSM/hyena/Block State Transformers https://t.co/HrQIWgtTIj
https://t.co/mveReauq1S They remind me of Q-RNNs https://t.co/mwRsydj5dA and play around with different parallelization ideas. I don't think transformers are that special and there are many equivalent
1
3
26
Why not get the best of both worlds by combining SSMs and Transformers? Excited to share our work at #NeurIPS2023: "Block-State Transformers." BST hits new highs in long-range language modeling and LRA tasks. paper: https://t.co/nHt6OGyez1 1/
8
65
378
Tips for non-technical entrepreneurs http://t.co/9lmKKTLx
0
0
2
Its 2012, Canadian eCommerce stuck in the 90's: http://t.co/OXpWKT6n
0
0
1
#Montreal should be the #innovation and #start-up gate keeper between Europe and the US. Wouldn't entering respective markets be easier?
0
0
0
33% of online shoppers in Canada leave their cart before check-out due to high shipping costs (emarketer). Free shipping is must in Canada
0
0
1