felix_red_panda Profile Banner
Felix Profile
Felix

@felix_red_panda

Followers
5K
Following
22K
Media
192
Statuses
6K

speech synthesis and LLM nerd, DMs open, working on LLM stuff

Berlin, Germany
Joined June 2020
Don't wanna be here? Send us removal request.
@felix_red_panda
Felix
2 years
All evals of ML models suck - but some are useful ๐Ÿ™ƒ
2
3
76
@SzymonOzog_
SzymonOzog
18 days
Part 2 of the Penny worklog out now! This time we're fixing our previous shortcomings on small buffer sizes and we're outperforming NCCL on all sizes that matter for LLM inference ๐Ÿงต
2
6
71
@felix_red_panda
Felix
28 days
Athens Iโ€™m in you
1
0
15
@felix_red_panda
Felix
1 month
Developers!, Developers!, Developers!
1
0
10
@felix_red_panda
Felix
1 month
finally a short form video platform with 30% @sama content
0
0
4
@felix_red_panda
Felix
2 months
@tugot17
Piotr Mazurek
2 months
We break down the performance all the way to calculating the number of FLOPs required to produce an embedding.
0
0
7
@tugot17
Piotr Mazurek
2 months
Ever wondered why embedding models are offered so cheaply? It is because you can process literally billions of tokens a day even on a consumer-grade GPU like the 4090. Check out our new text investigating the economics of embedding model inference. Link in the next tweet
6
23
207
@felix_red_panda
Felix
2 months
Why does embedding the entire Wikipedia only cost a few dollars? Deep dive blog post, link below
5
17
253
@felix_red_panda
Felix
2 months
the best way to predict the future is to invent it
0
0
7
@felix_red_panda
Felix
2 months
chat has been the labubumatchaficiation of LLMs
1
1
16
@felix_red_panda
Felix
2 months
amazing total lunar eclipse yesterday (picture taken with a 600mm lens)
2
0
25
@tugot17
Piotr Mazurek
2 months
What are the profit margins of serving DeepSeek ๐Ÿณ? @schreiberic and I discuss large-scale MoE inference in depth. Blog post link below
11
25
242
@tugot17
Piotr Mazurek
3 months
good things coming ๐Ÿง
11
23
446
@cloud11665
cloud
3 months
single-threaded vector masked bit group counting + mmap, 821x faster, 11.4GiB/s
@healeycodes
Andrew Healey
3 months
wrote the same word-counting program 5 times, each faster than the last best result: 494ร— faster than my first Python version โ€” by using SIMD in C all are O(n), but some squeeze far more from the CPU and memory!
1
1
119
@felix_red_panda
Felix
3 months
Re: GPT-4o depreciation: LLMs are highly susceptible to Hyrumโ€™s Law
0
0
16
@felix_red_panda
Felix
3 months
Is there any place that hosts Kimi K2 base as API
0
0
3
@vtabbott_
Vincent Abbott
4 months
Adding multi-level performance models to diagrams. This will allow performance models of FlashAttention / matmul / distributed MoEs to be dynamically calculated. Colors indicate execution at different levels, and the hexagons indicate a partitioned axis.
1
4
75
@tugot17
Piotr Mazurek
4 months
I solved every single problem in the CUDA mode book. A quick thread summarizing this experience and what I learned 1/x
31
244
2K
@felix_red_panda
Felix
5 months
hacker news doing hacker news things ๐Ÿ˜„
2
0
21