RaghuGanti Profile Banner
Raghu Ganti Profile
Raghu Ganti

@RaghuGanti

Followers
297
Following
857
Media
6
Statuses
262

Researcher, A.I., dancer

White Plains, NY
Joined January 2012
Don't wanna be here? Send us removal request.
@RaghuGanti
Raghu Ganti
3 days
RT @PyTorch: The Kubeflow Trainer project has been integrated into the PyTorch Ecosystem! This integration ensures that Kubeflow Trainer al….
0
34
0
@RaghuGanti
Raghu Ganti
1 month
RT @PyTorch: PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI….
0
65
0
@RaghuGanti
Raghu Ganti
3 months
RT @LysandreJik: The Transformers library is undergoing it's largest pivot to date 🙌. It now cements its role as the central model definiti….
0
59
0
@RaghuGanti
Raghu Ganti
3 months
RT @PyTorch: PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted proje….
0
47
0
@RaghuGanti
Raghu Ganti
3 months
🚀 Bamba v2 (9B) is here: faster, stronger, and smarter!.A leaderboard model in just 3T tokens!!. Bamba v1 +1T tokens of training. Outperforms Llama 3.1 8B on L1 & L2 benchmark scores 📈. 2–2.5× faster inference with vLLM than standard transformer based models 🏎️. Open weights +.
2
20
76
@RaghuGanti
Raghu Ganti
4 months
RT @tri_dao: Very strong 8B and 56B Mamba hybrid models trained to 20T tokens, on 6K H100s, with FP8!.This answers many of the open questio….
0
57
0
@RaghuGanti
Raghu Ganti
5 months
RT @danielhanchen: Excited to share that @UnslothAI now supports:. • Full fine-tuning + 8bit.• Nearly any model like Mixtral, Cohere, Grani….
0
53
0
@RaghuGanti
Raghu Ganti
5 months
RT @PyTorch: Optimize your model training with smarter memory management!. Check out our latest blog post to learn how PyTorch’s new activa….
0
22
0
@RaghuGanti
Raghu Ganti
5 months
@Thom_Wolf @eliebakouch Yup, we are working on that!.
0
0
2
@RaghuGanti
Raghu Ganti
5 months
2
0
4
@RaghuGanti
Raghu Ganti
7 months
For the MoE experts, I am curious as to how many experts are triggered in a batch of sequences? If I have a batch of 16, would it be possible to trigger (16 x num active parameters) in one forward pass?. Could not find any study either 😢.@deepseek_ai @MistralAI.
0
0
0
@RaghuGanti
Raghu Ganti
8 months
@hsu_byron interestingly Mamba2 has many triton kernels :).
0
0
5
@RaghuGanti
Raghu Ganti
8 months
🚀 Exciting News! 🚀. In a joint effort between IBM Research, Princeton, CMU, and UIUC, we are thrilled to announce the release of our high-performing hybrid Mamba2 model! This model is trained entirely on open datasets, and we’re releasing intermediate and final checkpoints to
7
28
165
@RaghuGanti
Raghu Ganti
8 months
RT @PyTorch: Presenting HadaCore: Tensor Core Accelerated Hadamard Transform Kernel 🔍 Take a look at how we achieve state-of-the-art perfor….
0
31
0
@RaghuGanti
Raghu Ganti
8 months
Super kicked by this work from @PyTorch team at @AIatMeta and @IBMResearch team. @TheZachMueller , now we just need to get this into @huggingface accelerate and enable the community!. Let’s go and make training even faster!!! 🔥🔥🔥.
@PyTorch
PyTorch
8 months
Supercharging Training using float8 and FSDP2 ⚡ .Read our latest blog to find out how we achieve up to 50% throughput speedup while achieving loss & evaluation benchmark parity in training:
Tweet media one
0
0
3
@RaghuGanti
Raghu Ganti
8 months
Great work from @IBM and @PyTorch ! This will allow for native tensor parallel using @huggingface transformers library! Long sequences are now being tamed ;).
@art_zucker
Arthur Zucker
9 months
Native tensor parallel has landed in transformers!!! thanks a lot to the torch team for their support! . Contributions are welcome to support more models! 🔥.
0
0
3
@RaghuGanti
Raghu Ganti
9 months
RT @aashkaa_: I’m presenting our poster on INDUS: Effective & Efficient Language Models for Scientific Applications at #EMNLP2024 tomorrow….
0
2
0
@RaghuGanti
Raghu Ganti
9 months
RT @PyTorch: Learn the inner workings of Triton, the hardware agnostic language for GPU programming and powering TorchInductor: https://t.c….
0
35
0