Ex0byt Profile Banner
Eric Profile
Eric

@Ex0byt

Followers
11K
Following
812
Media
145
Statuses
789

Strategic Technologist. Doing my little part to democratize foundational, truly open AI for all. Knowledge is a passion. Lead by example. (Opinions are my own.)

N.Y New York
Joined March 2009
Don't wanna be here? Send us removal request.
@Ex0byt
Eric
4 days
as predicted.. there it is. Good Stuff @Prince_Canuma
@Prince_Canuma
Prince Canuma
4 days
Here is a draft PR, there is still lots to improve and change. I will get to it later today. If you have a better idea or solution, benchmark it, send us an issue and PR. Enjoy! https://t.co/ctXtw0Syc7
1
0
9
@Ex0byt
Eric
5 days
It's the small acts of kindness that count. Thank you HuggingFace team πŸ€—. received some credits out of the blue to help offset out-of-pocket job running, research hosting, and open model storage costs. Not a sponsorship, genuinely appreciated all the same. What an amazing
1
0
14
@Ex0byt
Eric
6 days
was heads down this weekend solving for much of the currently noted "disappointments" with the "Flash-MoE" hype, but on an NVIDIA GB10 β€” zero-copy GPU reads direct from mmap'd NVMe page cache (eliminate CPU trips, 1.94Γ—), and trained/tested a pre-attention expert prediction model
3
2
41
@Ex0byt
Eric
7 days
Inspired by all the community support β€” as a thank-you and sweetener: the next 100 confirmed donations will get access to https://t.co/GCgYkpSqqN ($250.00 value, and you support a good cause)β€” our most powerful handcrafted PRISM model to date, with over-refusals, bias, and
Tweet card summary image
huggingface.co
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
@0xSero
0xSero
7 days
In 72 hours I got over 100k of value 1. Lambda gave me 5000$ credits in compute 2. Nvidia offered me 8x H100s on the cloud (20$/h) idk for how long but assuming 2 weeks that'd be 5000$~ 3. TNG technology offered me 2 weeks of B200s which is something like 12000$ in compute
7
3
45
@Ex0byt
Eric
7 days
Quick Update: currently testing on scitrera/dgx-spark-pytorch-runtime (DGX aarch64 compatibility headaches)β€” experimenting with an io_uring expert loader for Kimi-K2.5 to acompany the MoE selector. Python's GIL serializes threaded pread, so io_uring should bypass this entirely:
@Ex0byt
Eric
12 days
Exciting Experiment Update: We ran StepFun_ai's Step-3.5-Flash (197B MoE) on 6.29 GB of GPU memory! Flat. Zero growth. Same footprint at token 1 as at token x100. The model's weights are ~105 GB INT4 (394GB original bf16!). We're running it on 6.29 GB!! β€” 1/16th the weight
4
3
36
@Ex0byt
Eric
7 days
Progress thrives in the open. You had us all worried for a bit β€” thank you MiniMax_AI!
@SkylerMiao7
Skyler Miao
7 days
M2.7 open weights coming in ~2 weeks. still actively iterating just updated a new version on yesterday β€” noticeably better on OpenClaw.
0
0
33
@Ex0byt
Eric
7 days
elonmusk, nvidia MichaelDell β€” one unit. We'll make it count for everyone. In just a few weeks with scrappy hardware we've shown 1T+ parameter intelligence can run on everyday consumer devices. Imagine what a Single Dell Pro Max GB300 could unlock for the open source
@0xSero
0xSero
10 days
@Ex0byt @sudoingX will be among those with open access. This will be pooled with my 3090s. I am e-begging, but I will make it up to you.
3
7
77
@Ex0byt
Eric
8 days
Qwen3.5 27B is awesome (the entire family above 9B is impressive). You can now try it directly in your browser at SOTA speeds with whatever GPU you have: https://t.co/avWxUd8vNL My previous research in practice - The `Intel/Qwen3.5-27B-int4-AutoRound` is particularly good.
Tweet card summary image
huggingface.co
This web app lets you type messages (and optionally add images) and have an AI respond in real time. First pick a model from the list, then enter your prompt and the assistant will generate a reply...
@0xSero
0xSero
8 days
A 27B model is #2 on pinch-bench You’d need 150,000$ in GPU hours to train this from scratch (base + post training) Basically 1-2 weeks over 256 H100s That is not unreasonable, you’d need 540B tokens for pre-training and a bit more for post training. None of this is crazy
33
121
2K
@Ex0byt
Eric
8 days
priori:
0
1
9
@ZixuanLi_
Zixuan Li
9 days
Don't panic. GLM-5.1 will be open source.
274
427
8K
@Ex0byt
Eric
10 days
My handcrafted local AI homeboy J.A.R.V.I.S. is gassing us up… go off, little king!
2
4
43
@Ex0byt
Eric
10 days
Get Excited: @0xSero and I are close β€” B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share
@0xSero
0xSero
10 days
@pierrelezan Yes, @Ex0byt is working on this.
9
22
240
@Ex0byt
Eric
11 days
Kimi-K2.5 (1T-parameter MoE) running coherently on 25GB of GPU memory (on a unified 128 GB machine)!
36
23
563
@Ex0byt
Eric
12 days
Okay, I've had it sitting around since the 13th. I think it's time to get this M5 Max 18-core CPU/40-core GPU, 128GB RAM, 4TB SSD baby monster out of the box and see what it can do?
13
1
109
@Ex0byt
Eric
12 days
Exciting Experiment Update: We ran StepFun_ai's Step-3.5-Flash (197B MoE) on 6.29 GB of GPU memory! Flat. Zero growth. Same footprint at token 1 as at token x100. The model's weights are ~105 GB INT4 (394GB original bf16!). We're running it on 6.29 GB!! β€” 1/16th the weight
37
37
517
@Ex0byt
Eric
13 days
Take my money @MichaelDell
10
1
48
@Ex0byt
Eric
13 days
Y'all know how maniacal I am about speed, efficiency, and OSS. Check this puppy out: ~900 tok/s. Will give it a try and share some thoughts.
@ant_oss
Ant Open Source
13 days
⚑️ 892 tokens/s β€” our 100B diffusion LLM, LLaDA2.1-flash, is now live on @ZenMuxAI! With Token Editing, LLaDA 2.1 goes from research breakthrough to production-ready speed. Diffusion models just got real. Try it via API or Chat πŸ‘‡ https://t.co/8ObarWTPio #LLaDA #ZenMux #AI #dLLM
1
2
7
@Ex0byt
Eric
13 days
Everyone is working towards more efficient MoEs. An elegant and practical attention architecture/implementation from Kimi.
@Kimi_Moonshot
Kimi.ai
13 days
Introducing π‘¨π’•π’•π’†π’π’•π’Šπ’π’ π‘Ήπ’†π’”π’Šπ’…π’–π’‚π’π’”: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with
0
0
4