Explore tweets tagged as #TurboSparse
โญ๏ธ From ๐ฃ๐ผ๐๐ฒ๐ฟ๐๐ป๐ณ๐ฒ๐ฟ-๐ฎ: ๐๐ฎ๐๐ ๐๐ฎ๐ฟ๐ด๐ฒ ๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ผ๐ป ๐ฎ ๐ฆ๐บ๐ฎ๐ฟ๐๐ฝ๐ต๐ผ๐ป๐ฒ (June 10, 2024): PowerInfer-2 runs the TurboSparse-Mixtral-47B model on a smartphone at 11.68 tokens/sec, achieving up to 29.2ร speedup over existing
1
0
0
๐ Model sparsity is the key to PowerInfer-2, and TurboSparse makes it possible. We've pushed the FFN sparsity of Mistral and Mixtral to 90% and 97%, with even higher performance. Dive into the details at https://t.co/ChsDCZyxgI and get the models today: https://t.co/us1sgubiip.
2
0
13
Decoding speeds of PowerInfer-2, llama.cpp, and MLC-LLM on TurboSparse-Mistral-7B with different offloading setups. โ50% offloadโ means 50% model weights of FFN blocks are offloaded to flash storage. โNo offloadโ means all model parameters are resident in memory. A red label of
0
2
9
Model sparsity is the key to PowerInfer-2, and TurboSparse makes it possible, making it possible to run such a huge model on a Mobile phone. According to the Powerinfer-2 paper they have pushed the FFN sparsity of Mistral and Mixtral to 90% and 97%, with even higher performance.
0
0
11
In their paper, the researchers introduced 2 models TurboSparse-Mistral-7B and TurboSparse-Mixtral-47B. ๐จโ๐ง These models are sparsified versions of Mistral and Mixtral, respectively, ensuring not only enhanced model performance but also higher predictable sparsity. Notably,
0
0
6
ไธญๅฝใใใฎใใฎ AI ่ซๆใงใฏใใใใฉใผใใณในใ็ถญๆใใชใใใขใใซใฎในใใผในๆงใ 90% ใพใง้ซใใๆจ่ซใฎ 2 ๏ฝ 5 ๅใฎ้ซ้ๅใๅฎ็พใใใๆฐใใ dReLU ใใผในใฎในใใผในๅๆๆณใๆๆกใใฆใใพใ - MarkTechPost #LLMs #ConditionalComputation #SparsityEfficiency #TurboSparse
https://t.co/U9VasCDObB
0
0
0
@IlyasHairline @wey_gu If the foundation models use these activation functions and are pretrained from scratch, it would be ideal. We have demonstrated their negligible loss/perplexity compared to SwiGLU, but the trained model exhibited very sparse FFNs in TurboSparse paper and
0
0
2
๐ Introducing PowerInfer-2 from SJTU-IPADS Labs! Revolutionary LLM inference engine for mobile devices delivers a 47B model with a 29x speedup on smartphones! ๐ Discover the innovations: heterogeneous computing, I/O-Compute pipelining, and TurboSparse with up to 97% sparsity!
๐ Excited to introduce PowerInfer-2: A game-changing LLM inference engine for mobile devices by the #PowerInfer team. It smoothly runs a 47B model with a staggering 29x speedup on smartphones! Watch our demo to see it in action! ๐ฅ Technical details at: https://t.co/7bx5EnzWCs
4
0
0
@wey_gu Those ground-breaking speedups are all based on intrinsic sparsity and depends on ReLU more or less. We have confirmed some alternatives, like ReLU^2, and dReLU proposed in TurboSparse. They are very promising but not adopted by mainstream LLMs yet. Retraining is still essential
2
2
3
1
0
0
@hodlenx This is a Turbosparse-Mixtral 47B-int4 demoed on that phone, right? Could you kindly provide the reference quantization? I don't see it, only half-precision weights
0
0
4