Mohamed Abdelfattah @mohsaied X Profile

Mohamed Abdelfattah

@mohsaied

Followers

1K

Following

465

Media

87

Statuses

615

Assistant Prof @CornellECE and @cornell_tech. At the intersection of machine learning and hardware. Father. Muslim.

NYC

Joined March 2009

Don't wanna be here? Send us removal request.

Mohamed Abdelfattah

@mohsaied

2 days

Feature drop from @mako_dev_ai !! You can now define your own custom pytorch problem to generate GPU kernels on our #makogenerate platform!.

Waleed Atallah

@wAIeedatallah

2 days

MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! . Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module

0

1

5

Mohamed Abdelfattah

@mohsaied

9 days

We use large-scale text-to-text regression to predict specific parameters (e.g. utilization) of compute nodes in Google's datacenter, purely based on training on a (very) large corpus of unstructured system logs!!. Paper: Code:

Richard Song

@XingyouSong

9 days

Seeing text-to-text regression work for Google’s massive compute cluster (billion $$ problem!) was the final result to convince us we can reward model literally any world feedback. Paper: Code: Just train a simple encoder-decoder

0

6

Mohamed Abdelfattah

@mohsaied

10 days

RT @gene_ch0u: We've released all code and models for FlashDepth! It produces depth maps from a 2k, streaming video in real-time. This was….

0

70

0

Mohamed Abdelfattah

@mohsaied

14 days

For now you can use your favorite LLM from @OpenAI, @AnthropicAI, and @Google. We're also fine-tuning our own LLM to generate the best GPU code. More on that soon.

0

Mohamed Abdelfattah

@mohsaied

14 days

Write a #GPU kernel in 60 seconds! Check out MakoGenerate (completely free) and watch our launch video below! :).

Mako

@mako_dev_ai

15 days

Introducing MakoGenerate, a CUDA-writing AI agent. We're excited to make the research preview of MakoGenerate available today, completely free.

1

0

5

Mohamed Abdelfattah

@mohsaied

4 months

Thank you @songhan_mit for an insightful and dense guest lecture on the latest quantization and model efficiency methods developed in your group for generative AI!

0

8

Mohamed Abdelfattah

@mohsaied

5 months

Palu is our latest work, recently accepted to @iclr_conf!.What does @deepseek_ai and Palu have in common? We both perform low-rank projection of the KV-Cache to reduce the memory footprint of of LLMs during inference. Read more:

0

1

12

Mohamed Abdelfattah

@mohsaied

7 months

RT @yashakha: Why Many Token When Few Do Trick?. Meet Attamba – Attamba replaces Key-Value projections with SSMs, unlocking multi-token com….

0

46

0

Mohamed Abdelfattah

@mohsaied

7 months

RT @yashakha: Could compressing tokens with SSMs be the key to more efficient and scalable transformers? Explore how Attamba works in our e….

0

1

0

Mohamed Abdelfattah

@mohsaied

8 months

All our code is here:

0

1

Mohamed Abdelfattah

@mohsaied

8 months

We turn to bit-serial computing to enable arbitrary mixed-precision computation. 3 bits, 4 bits, 5 bits or 6 bits are all possible with little overhead, much more efficient than general bit-parallel architectures.

1

0

2

Mohamed Abdelfattah

@mohsaied

8 months

By re-purposing the redundant zero in FP4/3 , we can learn an *additional* quantization level, outperforming prior works. In fact, we can easily combine our learned datatypes with sota quantization algorithms: AWQ, smoothquant, Omniquant to get even higher accuracy!.

1

0

Mohamed Abdelfattah

@mohsaied

8 months

Bitmod is our latest work accepted at @HpcaArchCon!. Our mixed-precision bit-serial compute architecture for LLMs capable of running per-group quantized LLMs efficiently, with a new trick to gain more accuracy than prior work. Paper:

1

0

8

Mohamed Abdelfattah

@mohsaied

8 months

Thanks Dr. Masahiro Tanaka from Microsoft Deepspeed for an impressive guest lecture in my Deep Learning Efficiency course.

0

2

15

Mohamed Abdelfattah

@mohsaied

8 months

Next week at @MicroArchConf, Yuzong will present our BBS paper. For the first time, we can take advantage of *bidirectional* bit-level sparsity, extracting even more performance out of quantized DNNs. Read more:

0

4

Mohamed Abdelfattah

@mohsaied

9 months

Thanks Amir Gholami for giving a super interesting guest lecture in my Deep Learning Efficiency Class this week. I learned a lot about agentic systems and how to make them more automated and efficient!.

1

0

7

Mohamed Abdelfattah

@mohsaied

10 months

RT @akhauriyash123: Excited to see @jaseweston highlighting the importance of contextual behavior in transformers! In our #EMNLP2024 paper….

0

5

0

Mohamed Abdelfattah

@mohsaied

10 months

Busy week!. 1️⃣ Jordan Dotzel's FLIQS paper won best paper award at the AutoML Conference in Paris! 🎉. 2️⃣ Xilai Dai presented his first paper (Kratos) at FPL in Turin! . 3️⃣ I was in SF kick-starting a new project! (details soon).

0

7

Mohamed Abdelfattah

@mohsaied

11 months

.@icmlconf was a blast! We got lots of interest in our 3 posters (photos of students hard at work below) and I caught up with so many friends and made new ones. Also, the biggest conference I’ve been to so far with ~10k researchers!

1

0

20

Mohamed Abdelfattah

@mohsaied

1 year

RT @niclane7: #ICML2024 is almost over, but there is still time to take a look at our paper that advances architecture search (NAS). Our ap….

0

1

0