
Adam Ibrahim
@ai_phd
Followers
532
Following
160
Media
3
Statuses
74
Our tech report for Zamba-7B-v1 is out. We manage to come close to Llama 3 8B, Mistral 7B and others' level of performance, with only 1T tokens, with faster inference and less memory usage at a fixed context length. Read up to learn about our not-so-secret sauce!.
Zyphra is dropping the tech report for Zamba-7B, along with:.- Model weights (phase 1 and final annealed) at - Inference/generation code (both pure PyTorch and HuggingFace) at and Tech report:.
0
5
19
RT @RylanSchaeffer: Another #ICML2025 paper!. Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?….
0
15
0
RT @RylanSchaeffer: Excited to announce our paper ⬇️ was selected as an **Outstanding** paper at @TiFA_ICML2024 🔥🔥🔥. What did the paper sh….
0
7
0
RT @RylanSchaeffer: ❤️🔥❤️🔥Excited to share our new paper ❤️🔥❤️🔥. **Why Has Predicting Downstream Capabilities of Frontier AI Models wit….
0
53
0
Worth noting that we're working with @huggingface to release the model over the next week. Stay tuned !.
0
0
2
Here is the full paper of the continual pretraining project I have been working on last year. I encourage you to check it out if you pretrain LLMs (in particular, I recommend to start with takeaways in Section 2 and the Table of Contents at the start of the appendix).
Interested in seamlessly updating your #LLM on new datasets to avoid wasting previous efforts & compute, all while maintaining performance on past data? Excited to present Simple and Scalable Strategies to Continually Pre-train Large Language Models! 🧵1/N
1
14
33
RT @arankomatsuzaki: Mila presents Simple and Scalable Strategies to Continually Pre-train Large Language Models. Shows efficient updates t….
0
77
0
RT @QuentinAnthon15: State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computat….
0
60
0
Looking forward to see you at the #NeurIPS2023 #NeurIPS23 ENLSP workshop (rooms 206-207), where we'll have a poster about this work at 16:15 !.
1 Ever wondered how to keep pretraining your LLM as new datasets continue to become available, instead of pretraining from scratch every time, wasting prior effort and compute ? A thread 🧵.
0
2
6
RT @irinarish: @PranshuRanjan1 @SarvamAI Hi-NOLIN Hindi model will be presented by our @NolanoOrg team (@imtejas13 @_AyushKaushal) and col….
0
4
0
RT @ReyhaneAskari: (1/8) The great success of diffusion models such as Stable Diffusion, DALLE & Emu, have raised questions about the use o….
0
28
0
RT @M_L_Richter: Rarely been so excited about a paper. Our model has a quality level higher than Stable Diffusion 2.1 at a fraction (less t….
0
3
0