mohsaied Profile Banner
Mohamed Abdelfattah Profile
Mohamed Abdelfattah

@mohsaied

Followers
1K
Following
465
Media
87
Statuses
615

Assistant Prof @CornellECE and @cornell_tech. At the intersection of machine learning and hardware. Father. Muslim.

NYC
Joined March 2009
Don't wanna be here? Send us removal request.
@mohsaied
Mohamed Abdelfattah
2 days
Feature drop from @mako_dev_ai !! You can now define your own custom pytorch problem to generate GPU kernels on our #makogenerate platform!.
@wAIeedatallah
Waleed Atallah
2 days
MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! . Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module
0
1
5
@mohsaied
Mohamed Abdelfattah
9 days
We use large-scale text-to-text regression to predict specific parameters (e.g. utilization) of compute nodes in Google's datacenter, purely based on training on a (very) large corpus of unstructured system logs!!. Paper: Code:
@XingyouSong
Richard Song
9 days
Seeing text-to-text regression work for Google’s massive compute cluster (billion $$ problem!) was the final result to convince us we can reward model literally any world feedback. Paper: Code: Just train a simple encoder-decoder
Tweet media one
0
0
6
@mohsaied
Mohamed Abdelfattah
10 days
RT @gene_ch0u: We've released all code and models for FlashDepth! It produces depth maps from a 2k, streaming video in real-time. This was….
0
70
0
@mohsaied
Mohamed Abdelfattah
14 days
For now you can use your favorite LLM from @OpenAI, @AnthropicAI, and @Google. We're also fine-tuning our own LLM to generate the best GPU code. More on that soon.
0
0
0
@mohsaied
Mohamed Abdelfattah
14 days
Write a #GPU kernel in 60 seconds! Check out MakoGenerate (completely free) and watch our launch video below! :).
@mako_dev_ai
Mako
15 days
Introducing MakoGenerate, a CUDA-writing AI agent. We're excited to make the research preview of MakoGenerate available today, completely free.
1
0
5
@mohsaied
Mohamed Abdelfattah
4 months
Thank you @songhan_mit for an insightful and dense guest lecture on the latest quantization and model efficiency methods developed in your group for generative AI!
0
0
8
@mohsaied
Mohamed Abdelfattah
5 months
Palu is our latest work, recently accepted to @iclr_conf!.What does @deepseek_ai and Palu have in common? We both perform low-rank projection of the KV-Cache to reduce the memory footprint of of LLMs during inference. Read more:
0
1
12
@mohsaied
Mohamed Abdelfattah
7 months
RT @yashakha: Why Many Token When Few Do Trick?. Meet Attamba – Attamba replaces Key-Value projections with SSMs, unlocking multi-token com….
0
46
0
@mohsaied
Mohamed Abdelfattah
7 months
RT @yashakha: Could compressing tokens with SSMs be the key to more efficient and scalable transformers? Explore how Attamba works in our e….
0
1
0
@mohsaied
Mohamed Abdelfattah
8 months
All our code is here:
0
0
1
@mohsaied
Mohamed Abdelfattah
8 months
We turn to bit-serial computing to enable arbitrary mixed-precision computation. 3 bits, 4 bits, 5 bits or 6 bits are all possible with little overhead, much more efficient than general bit-parallel architectures.
Tweet media one
1
0
2
@mohsaied
Mohamed Abdelfattah
8 months
By re-purposing the redundant zero in FP4/3 , we can learn an *additional* quantization level, outperforming prior works. In fact, we can easily combine our learned datatypes with sota quantization algorithms: AWQ, smoothquant, Omniquant to get even higher accuracy!.
1
0
0
@mohsaied
Mohamed Abdelfattah
8 months
Bitmod is our latest work accepted at @HpcaArchCon!. Our mixed-precision bit-serial compute architecture for LLMs capable of running per-group quantized LLMs efficiently, with a new trick to gain more accuracy than prior work. Paper:
1
0
8
@mohsaied
Mohamed Abdelfattah
8 months
Thanks Dr. Masahiro Tanaka from Microsoft Deepspeed for an impressive guest lecture in my Deep Learning Efficiency course.
0
2
15
@mohsaied
Mohamed Abdelfattah
8 months
Next week at @MicroArchConf, Yuzong will present our BBS paper. For the first time, we can take advantage of *bidirectional* bit-level sparsity, extracting even more performance out of quantized DNNs. Read more:
Tweet media one
0
0
4
@mohsaied
Mohamed Abdelfattah
9 months
Thanks Amir Gholami for giving a super interesting guest lecture in my Deep Learning Efficiency Class this week. I learned a lot about agentic systems and how to make them more automated and efficient!.
1
0
7
@mohsaied
Mohamed Abdelfattah
10 months
RT @akhauriyash123: Excited to see @jaseweston highlighting the importance of contextual behavior in transformers! In our #EMNLP2024 paper….
0
5
0
@mohsaied
Mohamed Abdelfattah
10 months
Busy week!. 1️⃣ Jordan Dotzel's FLIQS paper won best paper award at the AutoML Conference in Paris! 🎉. 2️⃣ Xilai Dai presented his first paper (Kratos) at FPL in Turin! . 3️⃣ I was in SF kick-starting a new project! (details soon).
0
0
7
@mohsaied
Mohamed Abdelfattah
11 months
.@icmlconf was a blast! We got lots of interest in our 3 posters (photos of students hard at work below) and I caught up with so many friends and made new ones. Also, the biggest conference I’ve been to so far with ~10k researchers!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
20
@mohsaied
Mohamed Abdelfattah
1 year
RT @niclane7: #ICML2024 is almost over, but there is still time to take a look at our paper that advances architecture search (NAS). Our ap….
0
1
0