qubitium Profile Banner
Qubitium Profile
Qubitium

@qubitium

Followers
1K
Following
3K
Media
576
Statuses
6K

Building GPT-QModel, ModelCloudAI. OSS contributor to SGLang, vLLM, HF and more. AI SW/HW { Python, Go, Kotlin } Quantization Accelerator.

Earth
Joined February 2020
Don't wanna be here? Send us removal request.
@qubitium
Qubitium
1 month
πŸ₯³GPT-QModel v5.4.0 just released. New AWQ fused hw accelerated kernel for Intel Xeon CPU and XPU. Fused kernel has been validated for AMD Epyc as well. AWQ MoE model compat fixes and much more. https://t.co/r8vvNAYrXM
Tweet card summary image
github.com
Notable Changes: AWQ Torch Fused Kernel by @Qubitium in #2190 Make torch fused op compilable by @jiqing-feng in #2182 [FIX] AWQ MoE by @ZX-ModelCloud in #2171 add :? capture only syntax by @Qubiti...
1
1
3
@qubitium
Qubitium
2 hours
Nvidia gpus (pro/ent) pricing have dropped in price by 10% in the last 4 weeks in my casual observations of the second hand market. Supply is exceeding demand.
0
0
0
@qubitium
Qubitium
2 hours
...of FailSafe quantization (RTN by default). Different FailSafeStategy and Smoothers can be selected. Threshold to activate FailSafe can also be customized. Eval tests show minimal model level degradation vs full GPTQ quantization for these MoE modules which have exterme bias
0
0
0
@qubitium
Qubitium
2 hours
πŸŽ‰ New FailSafe config and FailSafeStrategy, auto enabled by default, to address un-even routing of MoE experts resulting quantization issue of some MoE modules. πŸ‘‡ Smooth operations are also introduced to FailSafeStrategy to reduce outliers impact... https://t.co/tLOo3ANOyg
Tweet card summary image
github.com
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang. - ModelCloud/GPTQModel
1
0
1
@qubitium
Qubitium
6 hours
I have to admit, the bullseye, wind speed, plane speed, altitude auto adjustments by computer leads to a sub meter accuracy. Ridiculous.
@Southcom
U.S. Southern Command
9 hours
On Dec. 22, at the direction of @SecWar Pete Hegseth, Joint Task Force Southern Spear conducted a lethal kinetic strike on a low-profile vessel operated by Designated Terrorist Organizations in international waters. Intelligence confirmed the low-profile vessel was transiting
0
0
0
@qubitium
Qubitium
1 day
Can someone explain to me why ARC Challenge scores can be ~0.20 level when all the weights are effectively zeros? How can absolutely wrong outputs have scoring that is not approaching 0.00?
0
0
0
@qubitium
Qubitium
1 day
We all know arm small-core count ARM ipc is off the charts relative to power but what make it very strange to me that: 1. Arm has lost the core count battle to x86. Why? 2. Arm has even lost the many core count power eff battle to x86 in recent years? Why? 3. Arm server chips
1
0
0
@qubitium
Qubitium
1 day
Btw, the bug is not fixed, only round. Core fix needs be pushed into the compilers. Kudos for Nvidia for getting on top of this critical bug. It took 4 days for bug verification and bug source validation. Let's see how long it takes them to push out the real fix. AMD needs to
0
0
0
@qubitium
Qubitium
1 day
I was right about ARM's toolchain immaturity. Read the post-mortem of this bug and ask yourself: Do you have a technical team that can deal with this? These type of bugs are what nightmwares are made out of.
@qubitium
Qubitium
5 days
Thinking about ARM + GPU aka Grace Hoppers/Blackwells? I personally would not touch ARM + GPU unless Nvidia/Qualcomm packages a free ARM engineer with their gpu. There are just too many of that 1% missing ops/libs/optimizations that crashes the entire stack.
1
0
0
@qubitium
Qubitium
2 days
πŸ‘€πŸ‘€πŸ‘€πŸ‘€
@davidmbudden
Budden
3 days
@IsaacKing314 Define "discovered". If your definition precludes a few hundred hours of my life in addition, then no. But yes.
0
0
0
@qubitium
Qubitium
2 days
PowerPC is the future. RISC is the future. Inference powered by clustered MacMinis is the future.
@garyfung
gary IH fung
3 days
@zephyr_z9 x86 is a dead end If you’re to hoard, hoard ram and ARM chips
0
0
2
@qubitium
Qubitium
3 days
Confirmed with clean, freshly booted ubuntu 24.04 vm. rocm installation is indirectly packaging 32bit libs. Wether this is necessary needs to be decided by amd devs. I don't there we need them. https://t.co/LuPhQJnhrZ
Tweet card summary image
github.com
Problem Description Why is RoCM install on a 64bit system bundling and requring 32bit libraries? Can we remove these as the pkg 24GB total install is already on the very large side and should be tr...
0
0
0
@qubitium
Qubitium
3 days
I have updated the issue to showcase that in a clean `vm`, the rocm install does not install any 32bit libraries but on the nearly identical vm host, something is causing the apt dependency to extra 80+ packages which include 32bit libraries.
1
0
0
@qubitium
Qubitium
3 days
RoCM 7.1.1 wants to install many lib32 (32bit) libraries on my 64bit only Ubuntu 24.04 system. Not sure if this is a bug or feature (32bit system support). Full install size at 24.4GB. Better than before but still needs work. Hacking off the i386 libs may help as they also force
Tweet card summary image
github.com
Problem Description Why is RoCM install on a 64bit system bundling and requring 32bit libraries? Can we remove these as the pkg 24GB total install is already on the very large side and should be tr...
1
0
2
@qubitium
Qubitium
3 days
Merry Xmas!
0
0
1
@qubitium
Qubitium
4 days
If I was the reviewer I would ask why appending "_runtime_" to all new the new library method is necessary. Is there such a thing as un-runtime library? Like naming a call method def call_method().
@YouJiacheng
You Jiacheng
4 days
Damn, AMD people renamed "libcuda" from cuda_library to cuda_runtime_library and got these code merged into pytorch.
1
0
1
@qubitium
Qubitium
4 days
NO! 2TB is not unified memory if they are sum of parts connected by tb5. Even Nividia don't call nvlinked gpus, unified memory.
@alexocheema
Alex Cheema - e/acc
5 days
Total unified memory: 2TB @ 3.2TB/s. Apple Silicon leads in memory / memory bandwidth unit economics. This is what matters for local AI where batch_size is small and workloads are memory-bound.
0
0
0
@qubitium
Qubitium
4 days
Exolabs is great but H200 141G is 30K in the used market. Clustering macs are negative values in everything but power usage. Slow ram vs HBM3 is not comparable.
@mweinbach
Max Weinbach
5 days
While it seems extreme to build this, it's roughly the same price as a single H100 and runs at less power than an RTX 5090 The compute is obviously lower, but the sheer amount of memory makes it interesting for 1T+ parameter models for testing/local deploments
1
0
1
@qubitium
Qubitium
4 days
Do people release that if OpenAI does not use the memory wafers they have horded, and not turn it into memory modules plus use it, for whatever reason, they can't sell back to market without huge loss and trigger the mother of all memory pricing unwinds? I personally think the
0
0
0