Waleed Atallah @wAIeedatallah X Profile

Waleed Atallah

@wAIeedatallah

Followers

210

Following

1K

Media

18

Statuses

376

Making AI go fast @mako_dev_ai

California

Joined May 2022

Don't wanna be here? Send us removal request.

Waleed Atallah

@wAIeedatallah

2 months

mako tune model

0

2

3

Waleed Atallah

@wAIeedatallah

5 days

And for what its worth, im 99% sure this is what facebook was doing with Llama. But instead of trying to create vulnerabilities, they want to sell ads (as preferences in LLMs) . Imagine if you ask Llama for a cold refreshing drink, will it recommend Coke or Pepsi? I'd bet my last.

0

Waleed Atallah

@wAIeedatallah

5 days

The craziest, not-impossible scenario is one where they want SOTA open-source coding models to vibe-code specific vulnerabilities into a ton of apps. something like this:.1. Infiltrate many standard open libraries .2. Train model to prefer infiltrated libs .3. A user vibe-codes.

Naveen Rao

@NaveenGRao

6 days

The open-weight QWEN models from Alibaba/China are largely agreed as being the best open models rn. In the West we are a bit reticent to use them out of fear of intentional bias by the model creators. Why make them open/free instead of behind paid APIs?. I'm not sure if this is.

3

0

3

Waleed Atallah

@wAIeedatallah

5 days

we added a leaderboard.

Mako

@mako_dev_ai

5 days

We added a leaderboard.

0

2

Waleed Atallah

@wAIeedatallah

7 days

What.

sophia

@cis_female

7 days

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it.kms.

0

4

Waleed Atallah

@wAIeedatallah

9 days

Now the real question. did it work for TriMul? Not yet 😅 but this is a pretty complex kernel with several parts! I'm gonna keep trying with some prompts. Internally, we have an agent that uses evolutionary search that is finding some humble successes. That'll be released soon

0

1

Waleed Atallah

@wAIeedatallah

9 days

For the prompt, try adding seed kernels or examples. Try adding docs. Add anything you want! We've measured non-trivial performance differences that arise just from the additional prompt a user can add.

1

0

1

Waleed Atallah

@wAIeedatallah

9 days

Next, validate the input format. this will make sure our validation pipeline actually works. Once you get the check mark, configure your agent, add a prompt, and you're off to the races!

1

0

1

Waleed Atallah

@wAIeedatallah

9 days

Next, enter your problem in the appropriate format. We will use TriMul from @a1zhang's GPUMODE contest. You can find it here: To get it in the right format, i just copy/pasted the reference format and the trimul from gpumode into ChatGPT and told it to.

1

0

2

Waleed Atallah

@wAIeedatallah

9 days

First make an account at to generate GPU kernels for free. From there, you can select "Your custom problem" from the drop down menu. This will open up the template you need to adhere to for our validation pipeline to work.

1

0

1

Waleed Atallah

@wAIeedatallah

9 days

MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! . Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module

2

4

18

Waleed Atallah

@wAIeedatallah

9 days

the behind the scenes that led to this post must have been wild lol. either The Information had bad info (less likely) or Nvidia went back to OpenAI and made them an offer they could not refuse.

NVIDIA

@nvidia

9 days

We’re proud to partner with @OpenAI and continue powering the foundation of their work.

0

1

Waleed Atallah

@wAIeedatallah

16 days

The implications of this result are much greater than meet the eye.

Mohamed Abdelfattah

@mohsaied

16 days

We use large-scale text-to-text regression to predict specific parameters (e.g. utilization) of compute nodes in Google's datacenter, purely based on training on a (very) large corpus of unstructured system logs!!. Paper: Code:

0

1

Waleed Atallah

@wAIeedatallah

20 days

Before literally every single model release, people claim “this is it” . When it really is “it”, will we even know?.

VraserX e/acc

@VraserX

21 days

I’m convinced they got early access to GPT-5. Aidan works at OpenAI. Yacine just got fired from xAI and now Sam’s following him? That’s not random. Whatever they saw must be absolutely wild. This might be the moment before the internet breaks.

0

1

Waleed Atallah

@wAIeedatallah

20 days

LLM+Search turns out to be pretty good at writing kernels.

Mako

@mako_dev_ai

20 days

MakoGenerate with Evolutionary Search is already creating production-quality #CUDA kernels that beat torch.compile and expert-written kernels on real world use cases. We'll be posting examples with code throughout the week, but a few highlights are below 🧵. (ps we're hiring)

0

2

Waleed Atallah

@wAIeedatallah

21 days

The splendor of AI's capability knows no bounds.@mako_dev_ai 💙 @modal_labs.

Charles 🎉 Frye

@charles_irl

21 days

write CUDA kernels the way Ian Buck always envisioned them: in flußig deutsch y español.

0

1

8

Waleed Atallah

@wAIeedatallah

21 days

@OpenAI @Anthropic @GoogleDeepMind Lastly, a shout out to the plethora of prior work that has shown up over the last 12 months. @GPU_MODE is an amazing community doing some great and open source work in this area, including releasing KernelLLM. @ScalingIntelLab and everyone involved in creating KernelBench have.

0

Waleed Atallah

@wAIeedatallah

21 days

@OpenAI @Anthropic @GoogleDeepMind Now watch as the code is generated, compiled, validated, and benchmarked! Depending on the compute budget you assigned while configuring the agent, MakoGenerate will iteratively refine and try to improve the kernel by applying various optimization techniques. (6/6).

1

0

Waleed Atallah

@wAIeedatallah

21 days

@OpenAI @Anthropic @GoogleDeepMind Additional prompting is one of the most interesting features in MakoGenerate. It begs the question, can we prompt engineer our way to superior kernels? Your prompts can be empty, one line long, or pages long. It can include in context examples of high quality kernels,.

1

0

Waleed Atallah

@wAIeedatallah

21 days

Next, configure your agent. 1️⃣ pick a model, including the best from @OpenAI, @Anthropic, and @GoogleDeepMind 2️⃣ choose either #CUDA or #Triton 3️⃣ select a hardware platform for execution 4️⃣ and importantly, select the number of kernels to generate. (4/6).

1

0

Waleed Atallah

@wAIeedatallah

21 days

Start with selecting a PyTorch reference. The first 200 problems are from @ScalingIntelLab KernelBench Level 1 and Level 2, which includes simple operators and simple fusion patterns. We also include 14x Level 5 problems, which cover more real world and complex use cases. (3/6).

1

0