Mako @mako_dev_ai X Profile

Mako

@mako_dev_ai

Followers

63

Following

16

Media

4

Statuses

22

AI-powered GPU kernel generation enabling continuous optimization and universal deployment

Joined January 2025

Don't wanna be here? Send us removal request.

Mako

@mako_dev_ai

20 days

Introducing MakoGenerate, a CUDA-writing AI agent. We're excited to make the research preview of MakoGenerate available today, completely free.

6

15

75

Mako

@mako_dev_ai

4 days

0

1

Mako

@mako_dev_ai

4 days

We added a leaderboard.

1

0

3

Mako

@mako_dev_ai

7 days

Try it for free on

0

Mako

@mako_dev_ai

7 days

You can now generate GPU kernels in #CUDA and #Triton for any arbitrary PyTorch code you have. Give it a shot!.

Waleed Atallah

@wAIeedatallah

7 days

MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! . Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module

1

5

Mako

@mako_dev_ai

18 days

And this is just the beginning! There are so many new features to explore and evaluate. LLMs+Search+RL is proving to be a game changer in capability. If this kind of work excited you, apply at .

0

Mako

@mako_dev_ai

18 days

@METR_Evals Level 5 covers more complex, real-world kernels, including DeepSeek MLA, among others. MakoGenerate wins on 4/14 kernels.

1

0

Mako

@mako_dev_ai

18 days

KernelBench Level 2 includes slightly more complex operations with simple fusion patterns. MakoGenerate again wins on 68/100 problems.

1

0

Mako

@mako_dev_ai

18 days

@ScalingIntelLab KernelBench Levels 1 includes simple PyTorch operations like matmul or linear layers. MakoGenerate matches or beats torch.compile on 68/100 problems.

1

0

Mako

@mako_dev_ai

18 days

MakoGenerate with Evolutionary Search is already creating production-quality #CUDA kernels that beat torch.compile and expert-written kernels on real world use cases. We'll be posting examples with code throughout the week, but a few highlights are below 🧵. (ps we're hiring)

1

0

2

Mako

@mako_dev_ai

20 days

Iterative refinement is pretty neat and can yield some decent results, but the real innovation is in applying evolutionary search. Stay tuned for some cool results coming later this week.

0

4

Mako

@mako_dev_ai

20 days

RT @wAIeedatallah: The research preview of MakoGenerate is available today, completely free. Keep reading to see what it does, how to try i….

0

1

0

Mako

@mako_dev_ai

20 days

@nvidia

0

1

Mako

@mako_dev_ai

20 days

Free @nvidia Blackwell GPUs for code generation and testing???? 👀 how long can we keep this up???.

1

0

1

Mako

@mako_dev_ai

20 days

Try it for free at This research preview is a fun way to see how well different models do when it comes to GPU code generation.

1

0

5

Mako

@mako_dev_ai

3 months

We started benchmarking @Meta Llama 4 Scout on @AMD MI300X and @NVIDIA H100. Shoutout to the zero day support making life easier. Using AMD's vLLM container on long-ish context lengths (5000/1000) we get the following:.2x MI300X - 526 tps.4x MI300X - 996 tps.8x MI300X - 1144 tps

0

1

Mako

@mako_dev_ai

4 months

And now with the latest @AMD's AITER library, there's even more performance to be unlocked! Exciting times ahead.

0

Mako

@mako_dev_ai

4 months

The secret sauce? A combination of:. AMD's Composable Kernel for Flash Attention.GEMM tuning via PyTorch TunableOps.Liger Kernel's normalization layers (at high batch sizes).torch.compile for everything else.

1

0

Mako

@mako_dev_ai

4 months

There's no single "best" kernel library for operations like attention. The optimal choice depends on your specific workload, batch size, and hardware. Our Mako Compiler automates the process of finding the best combinations.

1

0

Mako

@mako_dev_ai

4 months

Kernels Together Strong 🦧 Our last blog post showed how combining multiple kernel libraries can deliver state-of-the-art performance for AI models. We achieved up to 60% speedup on the FLUX.1-schnell model using #AMD MI300X GPUs! #AI #GPUOptimization

1

0

2