Aurick Qiao @AurickQ X Profile

Aurick Qiao

@AurickQ

Followers

478

Following

187

Media

20

Statuses

141

@Snowflake AI Research | @LLM360 | Previously @PetuumInc | PhD @SCSatCMU | CS @UWaterloo

Pittsburgh, PA

Joined November 2016

Don't wanna be here? Send us removal request.

Aurick Qiao

@AurickQ

4 days

Arctic Inference helps @allhands_ai complete real-world coding tasks 2x faster through faster LLM inference. Check it out!.

All Hands AI

@allhands_ai

4 days

Imagine coding agents finishing your requests and sending a pull request in 30 seconds 🤯. Check out this new video of OpenHands + DevStral + @Snowflake’s new inference method ArcticInference. It speeds up coding agents by as much as 2x over vLLM (which is already fast).

0

8

23

Aurick Qiao

@AurickQ

12 days

RT @StasBekman: My first project at @Snowflake AI Research is complete! . I present to you Arctic Long Sequence Training (ALST) . Paper: ht….

0

63

0

Aurick Qiao

@AurickQ

17 days

RT @JiaZhihao: One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But….

0

120

0

Aurick Qiao

@AurickQ

17 days

RT @vllm_project: vLLM has just reached 50K github stars! Huge thanks to the community!🚀.Together let's bring easy, fast, and cheap LLM ser….

0

21

0

Aurick Qiao

@AurickQ

22 days

RT @haoailab: [Lmgame Bench] o3-pro: A Milestone in LLM Gaming! 🕹️. The leap from o3 to o3-pro is bigger than you might have thought. We te….

0

109

0

Aurick Qiao

@AurickQ

1 month

RT @haoailab: 🚀 Dynasor is now production-ready in open-source stacks!.@NVIDIA TensorRT-LLM.@Snowflake ArcticInference. Try it today ↓. Ten….

0

17

0

Aurick Qiao

@AurickQ

1 month

RT @HongyiWang10: Super excited to see GenBio featured in the @googlecloud blog on bio startups!.

0

3

0

Aurick Qiao

@AurickQ

1 month

RT @jeffra45: 🧵1/ New release from @Snowflake AI Research:. Shift Parallelism is a new LLM inference technique built on top of vLLM, relea….

0

18

0

Aurick Qiao

@AurickQ

1 month

This is the combined work of the amazing inference systems team at @Snowflake AI Research: @samyamrb @MertHidayetoglu @YeWang6626 @1a1a11a @jeffra45 Mike Wyatt @_charlesxu @JerryL411 @spacemanidol @yuxionghe and many others!.

0

8

Aurick Qiao

@AurickQ

1 month

Arctic Inference is open-sourced and more details can be found in our blog:. Blog: Code:

1

2

17

Aurick Qiao

@AurickQ

1 month

Oh, and did I mention Arctic Inference also speeds up embeddings throughput on vLLM by 4.2-16x?

1

0

6

Aurick Qiao

@AurickQ

1 month

Together with SwiftKV and Speculative Decoding, a single deployment of Arctic Inference can achieve the trifecta of fast prefill (low response time), fast generation (high tokens/s), and high throughput (lower cost).

1

0

6

Aurick Qiao

@AurickQ

1 month

Shift Parallelism is especially great for those production workloads that have both latency-sensitive requests, but also intermittent bursts of batch inference jobs. Now, both types of requests can perform at their best on a single vLLM deployment.

1

0

5

Aurick Qiao

@AurickQ

1 month

Shift Parallelism builds upon our Arctic Ulysses sequence-parallelism (throughput) and tensor-parallelism (great for latency). By adaptively switching between the two, Arctic Inference gets the best of both worlds!. The key enabler is KV-Cache invariance.

1

2

11

Aurick Qiao

@AurickQ

1 month

Excited to open-source Shift Parallelism, developed at @Snowflake AI Research for LLM inference!. With it, Arctic Inference + @vllm_project delivers:. 🚀3.4x faster e2e latency & 1.06x higher throughput.🚀1.7x faster generation & 2.25x lower response time.🚀16x higher throughput

2

39

165

Aurick Qiao

@AurickQ

1 month

Very proud of our recent work at @Snowflake AI Research, which spans from the systems layer to the application layer. Check out this article from @VentureBeat which highlights two of our major initiatives: LLM inference performance and Text-to-SQL!.

VentureBeat

@VentureBeat

1 month

How Snowflake's open-source text-to-SQL and Arctic inference models solve enterprise AI's two biggest deployment headaches

0

2

14

Aurick Qiao

@AurickQ

1 month

RT @llm360: 📢📢 TxT360 has been updated to v1.1: . 🌟 BestofWeb: high-quality doc set from the web.❓ QA: Large Scale Synthetic Q&A dataset….

0

10

0

Aurick Qiao

@AurickQ

2 months

RT @probablybots: AIDO.ModelGenerator v0.1.2 is now on PyPI. Use the mgen CLI for no-code inference, embedding, and finetuning for the new….

0

6

0

Aurick Qiao

@AurickQ

2 months

RT @NVIDIAAIDev: 🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine….

0

5

0

Aurick Qiao

@AurickQ

2 months

RT @haoailab: Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers:.- A simple, consistent….

0

43

0