Bas Büller
@BasBuller
Followers
124
Following
8K
Media
14
Statuses
1K
Building something new, imagine if your computer actually understood you. eu/acc
Amsterdam, NL
Joined November 2016
I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a
14
46
460
(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint
34
144
887
@andersonbcdefg For some absurd reason 95% of programmers only go for frameworks and are scared of anything below. They somehow don't see that frameworks usually enshittify the whole thing!
7
3
208
I think AI is going to usher in a gold age of infra, not obviate it. It's just so clear that good CS fundamentals result in better AI built systems. Vibe coding works better with type safety, languages where syntax maps closely to semantics, referential transparency, tight
57
45
898
Surely frontier labs are hillclimbing performance benchmarks on their forks of DL libraries by RL training an LLM for that specific repo? With enough tests and benchmarking this might work
1
0
1
I value this more than all the benchmarks out there
my detailed personal benchmarks ran overnight. - Scout is best at summarization and function calling. exactly what you want from a cheap long ctx model. this is going to be a workhorse in coding flows and RAG applications. the single shot ICL recall is very very good. -
0
0
1
Said by many, worth repeating. Google Gemini 2.5 pro is really good at writing code
0
0
0
Re: Programming. We've been moving to a post language world for awhile now with all the libraries, runtime systems, frameworks etc. So syntax has been waning in importance. And AI will of course accelerate that. However, semantics remain important as ever. And natural language
45
39
408
Out of the box, 0 code changes, tinygrad gets 89 tok/s on FP16 Llama-3-8B on a 5090. On torch nightly, gpt-fast gets <exception> with --compile, and 14 tok/s without. This is why tinygrad will win. It's not about the benchmark, it's about being decent everywhere out of the box.
13
33
655
A very good Thursday to everyone. Let's continue to build the Europe of tomorrow together 🇪🇺🚀
259
473
3K
Let’s go
AMD 💕 @__tinygrad__ we are looking forward to working closely with @__tinygrad__ to help commoditize the petaflop https://t.co/LEjsUaPWHV
0
0
0
If European countries fixed their broken employee stock structures & stopped taxing it like clowns, entrepreneurs wouldn’t have to complain about Europe. The talent is here, diversity is an edge, and quality of life creates opportunity. Just be a bit more pro-business. More
17
32
329
The programming language of tomorrow is pseudo code, not english, pseudo code
0
0
0
🤯 This is absolutely insane, the gift just keeps on giving!
🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k
0
0
0
My posts about gpt-4.5 have some interesting comments like you don’t know what you’re talking about, or you aren’t using it right or you’re a slop enjoyer etc. No no you don’t get it, you don’t train the *largest* model to be a model about “taste”- it needs to make me more
37
5
279
With DeepSeek open sourcing this insane FS I only imagine how insane their internal infra is
🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min
0
0
0
Good chance the next major apps are essentially JIT compilers, but for the entire application and not just the underlying code runtime.
0
0
0