emi @technoabsurdist X Profile

emi

@technoabsurdist

Followers

799

Following

10K

Media

43

Statuses

2K

@herdora_com (yc s25)

Joined December 2015

Don't wanna be here? Send us removal request.

emi

@technoabsurdist

11 days

we built herdora because writing cuda sucks and hiring gpu engineers is impossible. we turn slow pytorch into fast gpu code. automatically. please reach out emilio [at] herdora [dot] com if you want faster/cheaper inference .

Y Combinator

@ycombinator

11 days

Herdora (@herdora_ai) is the Cursor for CUDA. It automatically turns your PyTorch code into optimized GPU kernels so you don't have to write CUDA. Congrats on the launch, @technoabsurdist & @gpusteve!.

3

25

emi

@technoabsurdist

5 days

RT @finbarrtimbers: Someone’s gonna release an actual “RL for kernel development” paper without measurement errors at some point and no one….

0

2

0

emi

@technoabsurdist

8 days

RT @gpusteve: so mi300x MOGS h100 with llama 4 scout in high concurrency 😮

0

4

0

emi

@technoabsurdist

9 days

RT @tryfondo: 🚀 @herdora_ai launched! Cursor for CUDA. "Herdora turns your slow PyTorch into fast GPU code, automatically.". 🌐 https://t.co….

tryfondo.com

👑 Herdora Launches: Cursor for CUDA

0

2

0

emi

@technoabsurdist

9 days

sometimes I accidentally run chat without agent mode and get scared by the horrible results. how do people live like that.

0

4

emi

@technoabsurdist

9 days

RT @gpusteve: 📜 ai doesn't run on just NVIDIA anymore - it’s running on many different chips, each with different quirks, tradeoffs, and sc….

0

2

0

emi

@technoabsurdist

11 days

RT @ycombinator: Herdora (@herdora_ai) is the Cursor for CUDA. It automatically turns your PyTorch code into optimized GPU kernels so you d….

0

15

0

emi

@technoabsurdist

11 days

RT @gpusteve: looking forward to exciting times.

0

2

0

emi

@technoabsurdist

13 days

RT @benchbytes: if your company doesn't buy creatine for you, you're ngmi.

0

1

0

emi

@technoabsurdist

15 days

RT @kenbwork: Reminds me a lot of the recent wave of (very successful) systems companies that rewrote popular frameworks like Kafka to take….

0

2

0

emi

@technoabsurdist

15 days

what's next at @herdora_ai: deeper kernel optimizations, advanced quantization techniques, and improved memory management. our ultimate goal: build the best hardware-agnostic tools for programming accelerators. break CUDA's software moat to lower industry costs, accelerate.

0

2

emi

@technoabsurdist

15 days

💰 mi300x delivers 60% better cost-efficiency! (5/6)

1

0

1

emi

@technoabsurdist

15 days

initial results:.• amd mi300x: 7,353 tokens/sec @ $1.99/hr → 3,695 tokens/dollar.• nvidia h100: 11,553 tokens/sec @ $4.99/hr → 2,315 tokens/dollar.

1

0

2

emi

@technoabsurdist

15 days

our first steps:.• custom mi300x-specific kernels for critical ops (attention, gemm).• optimized gpu execution with hip graph to slash latency and cpu bottlenecks.• introduced fp8 quantization, achieving high speed without significant accuracy loss (<2% drop on gsm8k dataset).

1

0

2

emi

@technoabsurdist

15 days

🦙 we optimize llama3.1-8b on an mi300x and show that after fp8 + custom-kernel tuning, it cranks out ≈ 7.3 k tok/s at 3.7 k tokens per dollar, beating H100 fp8 on cost-efficiency by 60%.

1

0

2

emi

@technoabsurdist

15 days

the challenge: amd’s software ecosystem (rocm) lacks the maturity of nvidia’s cuda. great hardware stays underutilized without strong software support. we're tackling exactly this gap. (3/6).

1

0

2

emi

@technoabsurdist

15 days

large language models are powering everything from chatbots to fully agentic systems. yet, almost everyone still defaults to nvidia gpus. the mi300x quietly stands out with 192gb hbm3 memory (over double h100’s 80gb) and 5.3tb/s bandwidth. (2/6).

1

0

2

emi

@technoabsurdist

15 days

📜 new blog post: amd’s mi300x gpu has huge potential for affordable, high-throughput llm inference - but it's currently underused due to software limitations. our initial optimizations already make it ~60% more cost-effective than nvidia's h100! (1/6). (🔗 links in final post).

2

15

emi

@technoabsurdist

1 month

RT @tenderizzation: “let’s see what happens if I bump the project to the next major CUDA release”

0

7

0

emi

@technoabsurdist

1 month

(the pipeline breaks on first step)

0

1

3