Aurick Qiao Profile
Aurick Qiao

@AurickQ

Followers
478
Following
187
Media
20
Statuses
141

@Snowflake AI Research | @LLM360 | Previously @PetuumInc | PhD @SCSatCMU | CS @UWaterloo

Pittsburgh, PA
Joined November 2016
Don't wanna be here? Send us removal request.
@AurickQ
Aurick Qiao
4 days
Arctic Inference helps @allhands_ai complete real-world coding tasks 2x faster through faster LLM inference. Check it out!.
@allhands_ai
All Hands AI
4 days
Imagine coding agents finishing your requests and sending a pull request in 30 seconds 🤯. Check out this new video of OpenHands + DevStral + @Snowflake’s new inference method ArcticInference. It speeds up coding agents by as much as 2x over vLLM (which is already fast).
0
8
23
@AurickQ
Aurick Qiao
12 days
RT @StasBekman: My first project at @Snowflake AI Research is complete! . I present to you Arctic Long Sequence Training (ALST) . Paper: ht….
0
63
0
@AurickQ
Aurick Qiao
17 days
RT @JiaZhihao: One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But….
0
120
0
@AurickQ
Aurick Qiao
17 days
RT @vllm_project: vLLM has just reached 50K github stars! Huge thanks to the community!🚀.Together let's bring easy, fast, and cheap LLM ser….
0
21
0
@AurickQ
Aurick Qiao
22 days
RT @haoailab: [Lmgame Bench] o3-pro: A Milestone in LLM Gaming! 🕹️. The leap from o3 to o3-pro is bigger than you might have thought. We te….
0
109
0
@AurickQ
Aurick Qiao
1 month
RT @haoailab: 🚀 Dynasor is now production-ready in open-source stacks!.@NVIDIA TensorRT-LLM.@Snowflake ArcticInference. Try it today ↓. Ten….
0
17
0
@AurickQ
Aurick Qiao
1 month
RT @HongyiWang10: Super excited to see GenBio featured in the @googlecloud blog on bio startups!.
0
3
0
@AurickQ
Aurick Qiao
1 month
RT @jeffra45: 🧵1/ New release from @Snowflake AI Research:. Shift Parallelism is a new LLM inference technique built on top of vLLM, relea….
0
18
0
@AurickQ
Aurick Qiao
1 month
This is the combined work of the amazing inference systems team at @Snowflake AI Research: @samyamrb @MertHidayetoglu @YeWang6626 @1a1a11a @jeffra45 Mike Wyatt @_charlesxu @JerryL411 @spacemanidol @yuxionghe and many others!.
0
0
8
@AurickQ
Aurick Qiao
1 month
Arctic Inference is open-sourced and more details can be found in our blog:. Blog: Code:
1
2
17
@AurickQ
Aurick Qiao
1 month
Oh, and did I mention Arctic Inference also speeds up embeddings throughput on vLLM by 4.2-16x?
Tweet media one
1
0
6
@AurickQ
Aurick Qiao
1 month
Together with SwiftKV and Speculative Decoding, a single deployment of Arctic Inference can achieve the trifecta of fast prefill (low response time), fast generation (high tokens/s), and high throughput (lower cost).
Tweet media one
1
0
6
@AurickQ
Aurick Qiao
1 month
Shift Parallelism is especially great for those production workloads that have both latency-sensitive requests, but also intermittent bursts of batch inference jobs. Now, both types of requests can perform at their best on a single vLLM deployment.
Tweet media one
1
0
5
@AurickQ
Aurick Qiao
1 month
Shift Parallelism builds upon our Arctic Ulysses sequence-parallelism (throughput) and tensor-parallelism (great for latency). By adaptively switching between the two, Arctic Inference gets the best of both worlds!. The key enabler is KV-Cache invariance.
Tweet media one
1
2
11
@AurickQ
Aurick Qiao
1 month
Excited to open-source Shift Parallelism, developed at @Snowflake AI Research for LLM inference!. With it, Arctic Inference + @vllm_project delivers:. 🚀3.4x faster e2e latency & 1.06x higher throughput.🚀1.7x faster generation & 2.25x lower response time.🚀16x higher throughput
Tweet media one
2
39
165
@AurickQ
Aurick Qiao
1 month
Very proud of our recent work at @Snowflake AI Research, which spans from the systems layer to the application layer. Check out this article from @VentureBeat which highlights two of our major initiatives: LLM inference performance and Text-to-SQL!.
@VentureBeat
VentureBeat
1 month
How Snowflake's open-source text-to-SQL and Arctic inference models solve enterprise AI's two biggest deployment headaches
0
2
14
@AurickQ
Aurick Qiao
1 month
RT @llm360: 📢📢 TxT360 has been updated to v1.1: . 🌟 BestofWeb: high-quality doc set from the web.❓ QA: Large Scale Synthetic Q&A dataset….
0
10
0
@AurickQ
Aurick Qiao
2 months
RT @probablybots: AIDO.ModelGenerator v0.1.2 is now on PyPI. Use the mgen CLI for no-code inference, embedding, and finetuning for the new….
0
6
0
@AurickQ
Aurick Qiao
2 months
RT @NVIDIAAIDev: 🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine….
0
5
0
@AurickQ
Aurick Qiao
2 months
RT @haoailab: Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers:.- A simple, consistent….
0
43
0