DeepSpeedAI Profile Banner
DeepSpeed Profile
DeepSpeed

@DeepSpeedAI

Followers
4K
Following
86
Media
22
Statuses
94

Official account for DeepSpeed, a library that enables unprecedented scale and speed for deep learning training + inference. 日本語 : @DeepSpeedAI_JP

Joined May 2020
Don't wanna be here? Send us removal request.
@DeepSpeedAI
DeepSpeed
3 days
It's exciting to see DeepSpeed leveraged by Ray in disaggregated hybrid parallelism for multimodal training. Blog: https://t.co/CzvjpyGXsr Congrats to Masahiro Tanaka (@toh_tana) and @anyscalecompute friends.
anyscale.com
Powered by Ray, Anyscale empowers AI builders to run and scale all ML and AI workloads on any cloud and on-prem.
0
5
27
@PyTorch
PyTorch
6 days
Zhipeng (Jason) Wang, PhD (@PKUWZP) explains how @DeepSpeedAI supports ML training research and why joining PyTorch Foundation benefits researchers and developers working on AI training workloads. 🔗 https://t.co/6FfXB98gb2 #PyTorch #DeepSpeed #OpenSourceAI #AIInfrastructure
0
13
114
@DeepSpeedAI
DeepSpeed
2 months
It's nice to share the most recent updates from the DeepSpeed project at #PyTorchCon, we will continue pushing the boundary of LLM distributed training for the OSS community.
@PyTorch
PyTorch
3 months
🎙️ Mic check: Tunji Ruwase, Lead, DeepSpeed Project & Principal Engineer at Snowflake, is bringing the 🔥 to the keynote stage at #PyTorchCon! Get ready for big ideas and deeper learning October 22–23 in San Francisco. 👀 Speakers: https://t.co/SOqCY9k7Wz 🎟️
0
1
7
@DeepSpeedAI
DeepSpeed
2 months
UIUC, AnyScale, and Snowflake significantly enhanced LLM offloading for the Superchip era!
@_Minjia_Zhang_
Minjia Zhang
2 months
🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly
0
3
12
@anyscalecompute
Anyscale
2 months
🚨Meetup Alert🚨 Join us for @raydistributed × @DeepSpeedAI Meetup: AI at Scale, including talks from researchers and engineers at @LinkedIn, @anyscalecompute and @Snowflake. Learn how leading AI teams are scaling efficiently with Ray’s distributed framework and DeepSpeed’s
1
2
10
@DeepSpeedAI
DeepSpeed
3 months
Step into the future of AI at #PyTorchCon 2025, Oct 22–23 in San Francisco 🔥 Join the DeepSpeed keynote and technical talks. Register: https://t.co/6iogY2eetT + Oct 21 co-located events: Measuring Intelligence, Open Agent & AI Infra Summits / Startup Showcase & PyTorch Training
Tweet card summary image
events.linuxfoundation.org
PyTorch Conference 2025: Join AI leaders, ML engineers & researchers in San Francisco, Oct 22-23. Experience the future of machine learning & deep learning.
0
2
7
@StasBekman
Stas Bekman
4 months
The @DeepSpeedAI would like to thank @modal for sponsoring our gpus for CI. This is an amazing contribution to our AI-democratizing open source project. https://t.co/OyzMClojB8 The Modal team is outstanding in their amazing support - speed, expertise and a human experience!
Tweet card summary image
github.com
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed
1
8
69
@DeepSpeedAI
DeepSpeed
4 months
ZenFlow is a massive improvement to DeepSpeed Offloading. Courtesy of an excellent collaboration among University of Virginia, UC Merced, Argonne National Laboratory, Microsoft, and Snowflake.
@PyTorch
PyTorch
4 months
Introducing #ZenFlow: No Compromising Speed for #LLM Training w/ Offloading 5× faster LLM training with offloading 85% less GPU stalls 2× lower I/O overhead 🚀 Blog: https://t.co/emUxxDQoGI 🚀 Try ZenFlow and experience 5× faster training with offloading: https://t.co/ppcoI4Af7V
0
0
10
@DeepSpeedAI
DeepSpeed
5 months
Kudos to Xinyu for giving an excellent presentation of DeepSpeed Universal Checkpointing (UCP) paper at USENIX ATC 2015.
@_Minjia_Zhang_
Minjia Zhang
5 months
📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,
2
2
17
@StasBekman
Stas Bekman
6 months
My first project at @Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: https://t.co/rpJ3WPipSK Blog: https://t.co/qxjHtKVx5q ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million
16
66
379
@DeepSpeedAI
DeepSpeed
6 months
Improved DeepNVMe: Affordable I/O Scaling for AI - Faster I/O with PCIe Gen5 - 20x faster model checkpointing - Low-budget SGLang inference via NVMe offloading - Pinned memory for CPU-only workloads - Zero-copy tensor type casting Blog: https://t.co/y7qn1qm1Fb
4
14
68
@PyTorch
PyTorch
8 months
PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei,
8
46
236
@DeepSpeedAI
DeepSpeed
8 months
Come hear all the exciting DeepSpeed updates at the upcoming PyTorch Day France 2025 DeepSpeed – Efficient Training Scalability for Deep Learning Models - https://t.co/FVJA1cKBfn @sched
pytorchdayfrance2025.sched.com
View more about this event at PyTorch Day France
0
1
5
@DeepSpeedAI
DeepSpeed
8 months
Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6
1
52
306
@DeepSpeedAI
DeepSpeed
9 months
AutoTP + ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: https://t.co/ZlCG2Aq5K5
0
20
75
@xariusrke
xr-5 🐀
10 months
1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.
4
15
73
@LFAIDataFdn
LF AI & Data Foundation
11 months
🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 https://t.co/LIFjumeAgb #DeepSpeed #AI #OpenSource #LFAIData
1
9
34
@DeepSpeedAI
DeepSpeed
1 year
🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: https://t.co/AoGSqyKb1E Tutorial: https://t.co/6YHSA5iOop
1
30
96
@DeepSpeedAI
DeepSpeed
1 year
Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings. - Near-complete communication hiding - Novel multi-node scalable TP solution Blog: https://t.co/08bPanyr9M
0
69
206
@DeepSpeedAI
DeepSpeed
1 year
Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.
@mvapich
MVAPICH
1 year
Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. @OSUengineering @Microsoft @OhTechCo @mvapich @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed
0
4
9