Official account for
@Microsoft
DeepSpeed, a library that enables unprecedented scale and speed for deep learning training + inference.
日本語 :
@MSFTDeepSpeedJP
Introducing Mixtral, Phi2, Falcon, and Qwen support in
#DeepSpeed
-FastGen!
- Up to 2.5x faster LLM inference
- Optimized SplitFuse and token sampling
- Exciting new features like RESTful API and more!
For more details:
#DeepSpeeed
#AI
Introducing DeepSpeed-FastGen 🚀
Serve LLMs and generative AI models with
- 2.3x higher throughput
- 2x lower average latency
- 4x lower tail latency
w. Dynamic SplitFuse batching
Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API
Want to train 10B+ ChatGPT-style models on a single GPU and 100B+ on multi-GPUs systems? Introducing DeepSpeed-Chat, an easy (single script), fast, and low-cost solution for training high-quality ChatGPT-style models with RLHF, 15x faster than SoTA.
Blog:
DeepSpeed-Chat aims to provide a highly efficient pipeline to help you explore RLHF training. Towards this aim we are releasing training logs and our experiences in a new tutorial:
(🧵 thread 1/3)
🚀 Announcing DeepSpeed ZeRO-Offload++
-6x Higher Training Throughput via Collaborative CPU/GPU Twin-Flow 🔥
-Systematic optimizations at no data precision loss
-Performance gain maintains for both single and multi-node cases
DeepSpeed v0.10.0 release! Includes our ZeRO++ release, H100 support, and many bug fixes/updates. Special thanks to our wonderful community of contributors!
ZeRO++ paper:
ZeRO++ blog:
v0.10.0 details:
We recently finished a long-awaited sync between microsoft/Megatron-DeepSpeed and NVIDIA/Megatron-LM 🚀🚀🚀
This resulted in a ~10% throughput gain, together with support for FlashAttention (both 1 and 2) and Rotary Positional Embedding (RoPE)!
Details:
🚀Exciting new updates on
#DeepSpeed
ZeRO-Inference with 20X faster generation!
- 4x lesser memory usage through 4-bit weight quantization with no code change needed.
- 4x larger batch sizes through KV cache offloading.
Available in DeepSpeed v0.10.3:
Want to train 1 million token context lengths (all 7 of the Harry Potter books!📚) on a GPT-like model w. 64 GPUs?
Announcing DeepSpeed-Ulysses🚀
This release enables highly efficient and scalable LLM training with extremely long sequence lengths🤯
We've released v0.9.0 to pypi that includes support for DeepSpeed Chat. You can now `pip install deepspeed` for all your RLHF needs 🚀🚀
v0.9.0 also includes several bug fixes (eg Stable Diffusion fixes) and updates (eg dropping torch 1.8 support)
🚀 Excited to announce our paper "ZeRO++: Extremely Efficient Collective Communication for Large Model Training" has been accepted at
#ICLR2024
!
🔍 ZeRO++ significantly reduces communication volume by 4x, achieving up to 3.3x speedup.
#DeepSpeed
#AI
DeepSpeed v0.9.2 release! Includes several bug fixes and added features across both ZeRO and HybridEngine (HE) used w. DS-Chat (see release notes)
Inference support for accelerating LLaMA coming in v0.9.3 :) For early access watch our main branch!
Announcing DeepSpeed4Science 🚀
We are building unique capabilities through AI system technologies to help domain experts solve society's most pressing science challenges, from drug design to renewable energy.
MSR Blog:
Website:
We're excited to see the community has started to upload DeepSpeed Chat checkpoints to
@huggingface
!
@amjeeek
has uploaded all model checkpoints from step 1, 2, 3 for OPT-1.3B along with his experience and training logs:
DeepSpeed +
@berkeley_ai
explore the effectiveness of MoE in scaling vision-language models, demonstrating its potential to achieve state-of-the-art performance on a range of benchmarks over dense models w. equivalent compute costs.
More coming soon!
Scaling Large-Scale Generative Mixture-of-Expert Multimodal Model With VL-MoE
Do you want to scale up your vision and language models? Take a look at our blog for details!
DeepSpeed +
@berkeley_ai
explore the effectiveness of MoE in scaling vision-language models, demonstrating its potential to achieve state-of-the-art performance on a range of benchmarks over dense models w. equivalent compute costs.
More coming soon!
Ammar (
@ammar_awan
) from the DeepSpeed team will be visiting
@kaust_corelabs
to share our latest features and how it enables trillion-parameter scale model training and inference for everyone!
The
@cemseKAUST
division and
@KAUST_News
Supercomputing Core Lab are hosting a talk on the usage of the
@Microsoft
DeepSpeed library on supercomputers.
While this event is hosted on campus, attendees outside of KAUST are warmly welcome to join virtually.
Details below!
Full house at the hands-on session diving into our HelloDeepSpeed example (). Everyone was super excited to see how ZeRO allowed them to magically train models that previously OOM'd!
Highlight 2:
Scientists can now train their large science models like Argonne's GenSLM COVID models with very long sequences
- 2X higher training throughput 🚀
- 13X longer sequence lengths achieved compared to SOTA training frameworks like Megatron-LM
Announcing DeepSpeed4Science 🚀
We are building unique capabilities through AI system technologies to help domain experts solve society's most pressing science challenges, from drug design to renewable energy.
MSR Blog:
Website:
Highlight 1:
Eliminating memory explosion problems for scaling Evoformer-centric structural biology models🧬
Today we’re releasing a set of highly memory-efficient Evoformer attention kernels that reduces peak memory for training and inference by 13x!🚀
Announcing DeepSpeed4Science 🚀
We are building unique capabilities through AI system technologies to help domain experts solve society's most pressing science challenges, from drug design to renewable energy.
MSR Blog:
Website:
2/3 RLHF training is an active field, we welcome contributions to explore this new direction together. You can find our corresponding training recipe scripts and logs for our OPT-1.3B actor + OPT-350M critic run:
Microsoft DeepSpeed open-source technologies empower researchers in Japan to train a SOTA Japanese LLM, an effort lead by National Institute of Informatics (NII), universities, and companies aiming to continuously develop publicly available Japanese LLM.