DeepSpeed (日本語アカウント)
@DeepSpeedAI_JP
Followers
1K
Following
127
Media
17
Statuses
134
深層学習を最適化するライブラリ DeepSpeed の情報を日本語で発信する公式アカウントです。 大規模な分散学習や推論を高速かつ簡単に実施できます。 このアカウントでは、DeepSpeedの新機能や論文などの最新情報を紹介していきます。英語Twitterアカウント: @DeepSpeedAI
Joined March 2023
10/22-23 にサンフランシスコ開催の PyTorch Conference で、DeepSpeedチームからのキーノートスピーチが行われます。 PyTorch Conference にご参加の方は、ぜひご聴講ください。 https://t.co/6AfzqLYGtJ
events.linuxfoundation.org
PyTorch Conference 2025: Join AI leaders, ML engineers & researchers in San Francisco, Oct 22-23. Experience the future of machine learning & deep learning.
Step into the future of AI at #PyTorchCon 2025, Oct 22–23 in San Francisco 🔥 Join the DeepSpeed keynote and technical talks. Register: https://t.co/6iogY2eetT + Oct 21 co-located events: Measuring Intelligence, Open Agent & AI Infra Summits / Startup Showcase & PyTorch Training
0
0
0
Zhipeng (Jason) Wang, PhD (@PKUWZP) explains how @DeepSpeedAI supports ML training research and why joining PyTorch Foundation benefits researchers and developers working on AI training workloads. 🔗 https://t.co/6FfXB98gb2
#PyTorch
#DeepSpeed #OpenSourceAI #AIInfrastructure
0
13
112
UIUC, AnyScale, and Snowflake significantly enhanced LLM offloading for the Superchip era!
🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly
0
3
12
DeepSpeed の Universal Checkpointing に関する論文が、ソフトウェアシステム分野のトップカンファレンスである ATCで発表されました。
📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,
0
0
8
📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,
0
3
9
PyTorch Day France marked the launch of a global PyTorch Day series—and the announcement of a major milestone: PyTorch Foundation is now an umbrella foundation. First new projects: @vllm_project + @DeepSpeedAI. Next Stop: PyTorch Day China, June 7 🇨🇳 https://t.co/n5tXI4Vipl
1
13
61
DeepSpeedプロジェクトのPyTorch Foundationへの参加が発表されました。 幅広いステークホルダーとのオープンな連携を通じて、コミュニティに一層貢献していきます。 公式アナウンス: https://t.co/6wZvnUZ7ZA
https://t.co/IkGegAQaFW
PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei,
0
0
5
PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei,
8
45
236
This is pretty neat. They insert into torch.compile and insert some profile-guided optimizations as well as a bunch of other specific optimizations like offloading. Since torch.compile is all in Python all their compiler passes are fairly accessible too! https://t.co/gxpcGQlILf
github.com
This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computatio...
Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6
1
29
225
DeepSpeedの新機能 "DeepCompile" をリリースしました! ✅プロファイルに基づく並列処理の自動最適化 ✅ ZeROやオフロードをコンパイラの最適化パスとして実現 ✅ ZeRO1 / ZeRO3 / オフロードの 1.2〜7倍の高速化を達成 詳細は下記をご覧ください ブログ(英語): https://t.co/ETSdQkWNQd
Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading https://t.co/1DzW7buCO6
0
6
33
HuggingFaceモデルに自動でテンソル並列 (TP) を適用する機能がリリースされました! - HuggingFaceモデルハブの大規模モデルをより大きいバッチサイズ・系列長で訓練可能に - Llama3のfine-tuningを4倍高速化 - ユーザによるコード変更が不要! ブログ(英語):
AutoTP + ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: https://t.co/ZlCG2Aq5K5
0
16
41
🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 https://t.co/LIFjumeAgb
#DeepSpeed #AI #OpenSource #LFAIData
1
9
34
Microsoft Research congratulates Yasuyuki Matsushita on being named a 2025 IEEE Fellow for his outstanding contributions to photometric 3D modeling and computational photography. https://t.co/HjpHJfZLfs
2
11
37
限られたGPUリソースで、非常に長い系列を学習するための新機能 Ulysses-Offload をリリースしました! - A100-80GB 4台だけで LLaMA3-8B を系列長2Mトークンで訓練可能 - 55%を超えるMFUを達成 ブログ: https://t.co/S5cPAkwk4h チュートリアル:
github.com
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed
🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: https://t.co/AoGSqyKb1E Tutorial: https://t.co/6YHSA5iOop
0
3
25
【 Microsoft Research Asia - Tokyo を設立】 アジア太平洋地域における人工知能研究とイノベーションの推進を強化するため、東京に新たな研究拠点である「Microsoft Research Asia-Tokyo(マイクロソフト リサーチ アジア東京)」を設立したことを発表します。 https://t.co/jqbMFFZyQ7
6
103
295
Microsoftから、新しくMoEのモデルが公開されました。少ないアクティブパラメータで、高い性能を達成しています。 省メモリかつ高速な分散学習のために、もちろんDeepSpeedが使われています!
Microsoft releases GRIN😁 MoE GRadient-INformed MoE demo: https://t.co/JmoWsl2Z4M model: https://t.co/sAnyke7IKK github: https://t.co/KOGTLOYnfr With only 6.6B activate parameters, GRIN MoE achieves exceptionally good performance across a diverse set of tasks, particularly in
0
1
22
Microsoft releases GRIN😁 MoE GRadient-INformed MoE demo: https://t.co/JmoWsl2Z4M model: https://t.co/sAnyke7IKK github: https://t.co/KOGTLOYnfr With only 6.6B activate parameters, GRIN MoE achieves exceptionally good performance across a diverse set of tasks, particularly in
14
165
785
One more JD: we are hiring at Tokyo Lab! https://t.co/Dt9HNu1zNX
0
10
45
Keep hiring for our new Tokyo lab... https://t.co/rYnmyfKuc9
1
34
92