Mobius_Labs Profile Banner
Mobius Labs Profile
Mobius Labs

@Mobius_Labs

Followers
3K
Following
1K
Media
144
Statuses
398

Multimodal AI for the world's scale. Proponents of Open Source and Open Intelligence. https://t.co/1nC6r8hOrE for some of our recent work.

Berlin, Germany
Joined April 2018
Don't wanna be here? Send us removal request.
@Mobius_Labs
Mobius Labs
2 days
RT @mobicham: Been playing with low-bit training this morning (3-bit, 2-bit, 1.58bit). Simple key trick to make it work: dumpen the gradien….
Tweet media one
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
0
3
0
@Mobius_Labs
Mobius Labs
14 days
RT @mobicham: Simple and fast MXFP8 activation quant kernel:.✓ Padding-aware for arbitrary seq lens .✓ SM-aware unrolling to improve occup….
0
11
0
@Mobius_Labs
Mobius Labs
17 days
RT @mobicham: Here's how to write a fast MXFP4/NVFP4 dequant GEMV kernel for batch-size=1:.-Use a mapping + tl.gather to map the quant matr….
0
2
0
@Mobius_Labs
Mobius Labs
22 days
RT @Mobius_Labs: FP4 weights meets high accuracy: Logit‐distillation bias correction for MXFP4 & NVFP4. On Llama-3.1-8B recovers ≥99% relat….
Tweet media one
mobiusml.github.io
0
5
0
@Mobius_Labs
Mobius Labs
24 days
FP4 weights meets high accuracy: Logit‐distillation bias correction for MXFP4 & NVFP4. On Llama-3.1-8B recovers ≥99% relative quality. Details at:
Tweet media one
mobiusml.github.io
1
5
38
@Mobius_Labs
Mobius Labs
1 month
RT @mobicham: GemLite runs fast on the MI300X, but there's still plenty of performance left to unlock
Tweet media one
0
1
0
@Mobius_Labs
Mobius Labs
1 month
RT @mobicham: rocm support added 👀, mainly focusing on the MI300X .
Tweet media one
github.com
Adds rocm support, focusing ont he MI300X
0
2
0
@Mobius_Labs
Mobius Labs
1 month
RT @mobicham: Damn. Luckily, we have HQQ that only takes 5 secs to quantize an 8B model, super useful to get started right away with any mo….
0
3
0
@Mobius_Labs
Mobius Labs
2 months
RT @mobicham: GemLite 0.4.7 is out 🔥. It boosts performance by 5-10 tokens/sec end-2-end by using an interesting trick which might seem wei….
Tweet media one
github.com
Fast low-bit matmul kernels in Triton. Contribute to mobiusml/gemlite development by creating an account on GitHub.
0
15
0
@Mobius_Labs
Mobius Labs
2 months
RT @mobicham: @main_horse @gaunernst Llama3.1 8B running A16W4 with FP16 accumulation on the 5090 RTX
0
1
0
@Mobius_Labs
Mobius Labs
2 months
RT @mobicham: Wait, Triton can be faster than Cutlass ? 🧐
Tweet media one
0
2
0
@Mobius_Labs
Mobius Labs
2 months
RT @mobicham: We just made inference 1.5x faster with larger batch-sizes compared to last week 🤯 - work in progress
Tweet media one
0
5
0
@Mobius_Labs
Mobius Labs
3 months
RT @mobicham: Well optimized Triton kernels can perform very well end-2-end, even competing with highly optimized kernels like Marlin. http….
0
4
0
@Mobius_Labs
Mobius Labs
3 months
GemLite: I feel the need, the need for speed!.
@mobicham
mobicham
3 months
GemLite is significantly outperforming the default A16W4 vLLM kernel on the MI300X 🚀
Tweet media one
0
0
3
@Mobius_Labs
Mobius Labs
3 months
RT @mobicham: GemLite is significantly outperforming the default A16W4 vLLM kernel on the MI300X 🚀
Tweet media one
0
2
0
@Mobius_Labs
Mobius Labs
3 months
Initial attempts on brining Gemlite to @AMD MI300X.
@mobicham
mobicham
3 months
Hitting ~200 tokens/sec with Llama3-8B (bs=1) using GemLite (after changes) on the AMD MI300X. Performance lands between the A100 and H100 — given the price point, the MI300X is shaping up to be a very compelling inference option!
0
0
5
@Mobius_Labs
Mobius Labs
4 months
RT @mobicham: So I run evaluation on Gemma 3 12B QAT vs. HQQ. HQQ takes a few seconds to quantize the model and outperforms the QAT versi….
0
12
0
@Mobius_Labs
Mobius Labs
4 months
Thanks to bfloat16 support, GemLite now runs models like Google’s Gemma3—avoiding the usual accuracy loss from fp16 conversions.
Tweet media one
huggingface.co
0
0
6