venkat_systems Profile Banner
Venkat Raman — inference/acc Profile
Venkat Raman — inference/acc

@venkat_systems

Followers
347
Following
3K
Media
89
Statuses
1K

distributed systems, low latency, inference, cuda | 🦀 | hobbies: ⛷️ 🏊🏽‍♂️ 📷

µs, ns, 80% speed-of-light
Joined January 2013
Don't wanna be here? Send us removal request.
@venkat_systems
Venkat Raman — inference/acc
16 hours
inspired by @uccl_proj n thunderkittens by @HazyResearch 🙏, my attempt at successfully challenging nvidia nccl perf: - starting with single process p2p all_reduce sum on 2xA100. - mpi p2p, h100, b200 n b300 is wip. - will oss soon once apis n abstractions are stable
0
0
2
@yacineMTB
kache
2 days
the gap between opus 4.5 and every other model is insane
160
51
2K
@venkat_systems
Venkat Raman — inference/acc
21 hours
huggingface tgi & mistral dot rs engines are written in rust. However if we look at intranode tp inference, they use uds. they also use protobuf over pickle. but vllm & sglang (uses same parts of vllm) use cpu shmem the former are leaving so much perf gains on the table for
@venkat_systems
Venkat Raman — inference/acc
2 days
vllm n sglang cpu-side engine overhead can be ns instead of µs, if written in c++ / rust.. this in turn will improve gpu util 30-60% gain in sustained goodput / tco based on my local experiments but it comes at a cost of research to production speed n researcher friendly
0
0
1
@venkat_systems
Venkat Raman — inference/acc
22 hours
or maybe i lack abundance mindset 😅
0
0
0
@venkat_systems
Venkat Raman — inference/acc
22 hours
i might be late to this.... in last 15 years, there are different generations of support, lead gen, crm, marketing, sales tools... let's say 2-3 players per market segment and per geo region still there are so many startups in this space... in ai era, this is on speeed feels
2
0
0
@venkat_systems
Venkat Raman — inference/acc
2 days
is anyone from openai still using codex cli internally ? i miss using it.. i want to, but every time it just shits the bed 😭😭
0
0
0
@venkat_systems
Venkat Raman — inference/acc
2 days
@venkat_systems
Venkat Raman — inference/acc
2 days
@thorstenball i was thinking along these lines too talk is cheap -> code is cheap is so surreal.. enabled by VC & labs subsidizing vibe coding all the oss agent frameworks are fungible.. infra on the other hand is not basic app layer infra is starting to get fungible too… next 2-3years
0
0
0
@venkat_systems
Venkat Raman — inference/acc
2 days
aged like wine ! (just a day old though 😜)
@AnthropicAI
Anthropic
2 days
Anthropic is acquiring @bunjavascript to further accelerate Claude Code’s growth. We're delighted that Bun—which has dramatically improved the JavaScript and TypeScript developer experience—is joining us to make Claude Code even better. Read more:
1
0
0
@venkat_systems
Venkat Raman — inference/acc
2 days
vllm n sglang cpu-side engine overhead can be ns instead of µs, if written in c++ / rust.. this in turn will improve gpu util 30-60% gain in sustained goodput / tco based on my local experiments but it comes at a cost of research to production speed n researcher friendly
2
0
3
@m_sirovatka
Matej Sirovatka
4 days
After 3 weeks, we have concluded our first problem of the @GPU_MODE x @nvidia competition, NVFP4 GEMV. Thanks to everyone who has participated, we have collected over 40k submissions from >200 users. Congrats to the winners and good luck with the next problem, NVFP4 GEMM 🔥
7
15
181
@venkat_systems
Venkat Raman — inference/acc
5 days
I agree @claudeai ASCII charts are bangers !
@ashvardanian
Ash Vardanian
6 days
With some Claude-generated ASCII charts, StringZilla now looks quite competitive for: 1. non-cryptographic hashing (vs xxHash, aHash?) 2. exact substring and byte-set search (vs memchr?) 3. UTF-8 tokenization (vs standard libs, regex, ICU?) Will update StringWars benchmark
0
0
2
@venkat_systems
Venkat Raman — inference/acc
7 days
update: found out that this optimization is not really valid as it doesn’t adhere to challenge spirit.. there is a benchmarking bug… reverted it with @m_sirovatka ‘s help @GPU_MODE discord is amazing 🙏🏽 wish i started there sooner than just their youtube channel
@venkat_systems
Venkat Raman — inference/acc
8 days
finally cursor <> gemini 3 pro worked for me... helped to beat my personal best 55 µs --> 24.4 µs gpu mode's Blackwell NVFP4 Kernel Hackathon finally in top 12 now 😅🎉
0
0
3
@venkat_systems
Venkat Raman — inference/acc
7 days
gemini 3 pro is also suffering from infinite loops similar to 2.5 pro
0
0
1
@venkat_systems
Venkat Raman — inference/acc
8 days
CUDA moat also comes from well designed abstractions and APIs that are backwards compatible.. let's take NCCL for example.. it supports different execution mode, topologies, diverse hw interconnects (nvlink, pcie, etc.,) one can get started quickly and take things to
@venkat_systems
Venkat Raman — inference/acc
19 days
@zephyr_z9 CUDA moat is same as python moat.. ecosystem n researchers n gpu engineers love it.. actual programmability.. nvidia provides low level apis so that u can write better n more performant versions of their high order frameworks.. i agree with u automated, manual or ai assisted
0
0
0
@venkat_systems
Venkat Raman — inference/acc
8 days
finally cursor <> gemini 3 pro worked for me... helped to beat my personal best 55 µs --> 24.4 µs gpu mode's Blackwell NVFP4 Kernel Hackathon finally in top 12 now 😅🎉
@venkat_systems
Venkat Raman — inference/acc
9 days
5 hours with cursor <> 4.5 opus ​Kernel #1 - NVFP4 Batched GEMV beat my personal best 106 µs --> 55 µs gpu mode's Blackwell NVFP4 Kernel Hackathon
0
0
0
@venkat_systems
Venkat Raman — inference/acc
9 days
HFT, system software, gpu kernels dev will teach you that 10µs is actually really long time https://t.co/fSZikc6218
@isDineshHere
Dinesh
10 days
Database System development will make you realise that 1 millisecond is actually a really really really long time ~300 tx/ms TigerBeetle vs 1 tx/ms PostgreSQL https://t.co/m4WbFfEPzO
0
0
1
@venkat_systems
Venkat Raman — inference/acc
9 days
5 hours with cursor <> 4.5 opus ​Kernel #1 - NVFP4 Batched GEMV beat my personal best 106 µs --> 55 µs gpu mode's Blackwell NVFP4 Kernel Hackathon
0
0
1
@venkat_systems
Venkat Raman — inference/acc
9 days
gemini-3-pro is cool, but it is unusable in cursor, i'm already on cli waitlist any better way to use it productively ? i guess google is still scaling up infra for this model
0
0
0