Ammar Ahmad Awan @ammar_awan X Profile

Ammar Ahmad Awan

@ammar_awan

Followers

280

Following

847

Media

28

Statuses

479

DeepSpeed-er @Microsoft, @MSFTDeepSpeed, Father, PhD, Wanna-be Professor, Technology Enthusiast.

https://t.co/2AG6DnaAWN

Bellevue, Washington

Joined September 2011

Don't wanna be here? Send us removal request.

Ammar Ahmad Awan

@ammar_awan

2 years

Very excited to lead this effort at Microsoft! My first release as a manager. Feeling proud of myself and my team who worked very hard to make this happen in a short amount of time :-)

DeepSpeed

@DeepSpeedAI

2 years

Introducing Mixtral, Phi2, Falcon, and Qwen support in #DeepSpeed-FastGen! - Up to 2.5x faster LLM inference - Optimized SplitFuse and token sampling - Exciting new features like RESTful API and more! For more details: https://t.co/386OvJtQLk #DeepSpeeed #AI

1

9

Deep Psychology

@DeepPsycho_HQ

3 months

This is literally Worth GOLD!

77

3K

18K

Tivadar Danka

@TivadarDanka

4 months

My feelings when working with tensors:

23

167

2K

Jeff Rasley

@jeffra45

7 months

🧵1/ New release from @Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

1

18

73

Min Choi

@minchoi

7 months

I asked ChatGPT o3 the top 10 most weirdest prompts people ask

34

32

257

Byron Hsu

@hsu_byron

1 year

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. https://t.co/cgJXXqXpM4

19

173

963

DeepSpeed (日本語アカウント)

@DeepSpeedAI_JP

1 year

オハイオ州立大学で開かれたイベントで、メンバーのAmmar Ahmad Awan @ammar_awan が、DeepSpeedの最適化に関する講演を行いました！オハイオ州立大学は分散並列処理の研究で広く知られており、DeepSpeedチームにも出身者が多くいます。

DeepSpeed

@DeepSpeedAI

1 year

Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.

0

1

4

Ammar Ahmad Awan

@ammar_awan

1 year

Felt great to be back at OSU. Thank you @dhabalkpanda, @harisubramoni for inviting me and enabling me to share the awesome DeepSpeed work with @mvapich team!

DeepSpeed

@DeepSpeedAI

1 year

Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.

0

2

MVAPICH

@mvapich

1 year

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. @OSUengineering @Microsoft @OhTechCo @mvapich @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

1

5

8

DeepSpeed

@DeepSpeedAI

1 year

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: https://t.co/LeNHlDZH3C

1

6

38

Dalia Mogahed

@DMogahed

2 years

Brought me to tears. She’s so respected by her peers. What an achievement. Academic excellence and ethical leadership. Asna Tabassum, we salute you.

2

36

134

Sebastien Bubeck

@SebastienBubeck

2 years

The phi-3 family is the work of an amazing team over many months, kudos to everyone!

4

110

Sebastien Bubeck

@SebastienBubeck

2 years

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

39

175

922

Stas Bekman

@StasBekman

2 years

If you're trying to run MoE Mixtral-8x7b under @MSFTDeepSpeed it's likely to hang on the first forward The solution is here https://t.co/6O1JgJnMle and you need deepspeed>=0.13.0 Thanks to Masahiro Tanaka for the fix. edit: looks like someone codified it even better:

github.com

ZeRO3 does not work with MoE models because the order of executing modules can change at every forward/backward pass (#4094, #4808). This PR adds an API to stop breaking down a module for parameter...

2

9

79

Yann LeCun

@ylecun

2 years

* Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes.

Parmita Mishra

@prmshra

2 years

This is an essential point people seem to misrepresent.

558

2K

9K

DeepSpeed

@DeepSpeedAI

2 years

#DeepSpeed joins forces with @Sydney_Uni to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver: 🚀 1.5x performance boost for #LLMs serving on #GPUs 🚀 Innovative (4+2)-bit system design 🚀 Quality-preserving quantization link: https://t.co/m6vcmXaWxb

1

26

168

Stas Bekman

@StasBekman

2 years

The other news is the introduction of @MSFTDeepSpeed Meetups, which will be conducted once about every 3 months. The inaugural one will be on Feb 12 6:00 PM - 8:00 PM at Redmond Reactor https://t.co/qikTXXkyrI Quote: "This will be the first ever meetup for the DeepSpeed

0

2

24

DeepSpeed

@DeepSpeedAI

2 years

Thanks @StasBekman! DeepSpeed team is hiring for various engineering and research roles! Come join us and steer the future of large scale AI training and inference.

Stas Bekman

@StasBekman

2 years

Finally I'm being told MSFT allocated more engineering positions on the @MSFTDeepSpeed team. As a long time Deepspeed user for the first few years I had to fix many bugs myself since the team was so small, and finally the time has come where I can just report them and the

0

4

22

DeepSpeed

@DeepSpeedAI

2 years

Are you a #DeepSpeed user, fan, contributor, and/or advocate? Are you interested in meeting people behind @MSFTDeepSpeed tech? Are you interested in #AI? If yes, come and meet the team at our first in-person meetup in the Seattle area! Register here: https://t.co/C3acG4Wvx0

2

9

21

Quentin Anthony

@QuentinAnthon15

2 years

Getting the most out of your hardware when training transformers requires thinking about your model as a sequence of GPU kernel calls. This mindset, common in HPC, is rare in ML and leads to inefficiencies in LLM training. Learn more in our paper https://t.co/qy6Q2MEJpw

7

76

330

Stas Bekman

@StasBekman

2 years

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should work well now: https://t.co/Bzg5DNyxym ZeRO++'s main feature is allowing you to use a hybrid approach if you can fit a model on a single node of 8 gpus. So it takes benefit of the super

3

12

78