Musk Viewer
About
Privacy Policy
Removal Request
Armand Joulin
@armandjoulin
7 months
Using parallel decoding to speed up inference of LLM: ✅ no need for a second model ✅ not finetuning ✅ negligible memory overhead
Giovanni Monea
@giomonea
7 months
🎉 Unveiling PaSS: Parallel Speculative Sampling 🚀 Need faster LLM decoding? 🔗 Check out our new 1-model speculative sampling algorithm based on parallel decoding with look-ahead tokens: 🤝 In collaboration with
@armandjoulin
and
@EXGRV
7
10
51
0
5
31