Alexey Tumanov @alsched X Profile

Alexey Tumanov

@alsched

Followers

548

Following

874

Media

5

Statuses

247

Assistant Professor of Computer Science @gatech_scs @gtcomputing | postdoc @Berkeley_EECS @ucbrise | ML Systems

Atlanta, GA

Joined December 2012

Don't wanna be here? Send us removal request.

Alexey Tumanov

@alsched

24 days

RT @agrawalamey12: After hitting evaluation puzzles like this in our own work, we analyzed patterns across LLM inference papers and identif….

0

3

0

Alexey Tumanov

@alsched

25 days

RT @agrawalamey12: Interesting work on long context inference from @nvidia, where they scale KV parallelism on gb200-nvl72 systems! To lear….

arxiv.org

As large language models (LLMs) handle increasingly longer contexts, serving long inference requests of millions of tokens presents unique challenges. We show that existing work for long context...

0

5

0

Alexey Tumanov

@alsched

27 days

RT @gatech_scs: Congratulations 👏 to our faculty who were recognized on the Spring 2025 CIOS Honor Roll for their outstanding teaching and….

0

1

0

Alexey Tumanov

@alsched

2 months

RT @SachitKuhar: Full code 🔓 Collaboration with @jinga_lala1 and @alsched. (6/6).#EfficientAI #EdgeAI #Quantizati….

github.com

Codebase for "PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off" - sachitkuhar/PLUM

0

1

0

Alexey Tumanov

@alsched

4 months

RT @agrawalamey12: Super excited to share another incredible systems that we have built over the past two years! Training giant foundation….

0

13

0

Alexey Tumanov

@alsched

4 months

RT @agrawalamey12: Super long-context models with context window spanning millions of tokens are becoming commonplace (@GoogleDeepMind Gemi….

0

14

0

Alexey Tumanov

@alsched

4 months

RT @agrawalamey12: Maya offers a transparent, accurate, and efficient way to model and optimize large-scale DL training without needing exp….

arxiv.org

Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training...

0

1

0

Alexey Tumanov

@alsched

7 months

RT @agrawalamey12: Sequence pipeline parallelism being rapidly adopted for extreme long context inference in the industry! Checkout our pap….

0

4

0

Alexey Tumanov

@alsched

9 months

RT @ACMSoCC: At SoCC’24, Anastasia Ailamaki from EPFL will give a keynote on how disaggregated memory resources are becoming the norm and h….

0

1

0

Alexey Tumanov

@alsched

10 months

Super-charged technical program this year at @ACMSoCC:.Looking forward! Hope to see you there! #socc24.

ACM SoCC

@ACMSoCC

10 months

We are just under a month away from SoCC’24! This year’s conference will be from Nov 20-22 at the Microsoft Campus in Redmond, WA . Early bird registration is now open until Nov 6. Make sure to register!

0

4

Alexey Tumanov

@alsched

10 months

RT @agrawalamey12: ⚡ Speed Meets Accuracy:. Unlike approximation-based methods, Mnemosyne achieves exact inference—ensuring that the genera….

0

2

0

Alexey Tumanov

@alsched

10 months

RT @agrawalamey12: @Google has silently but surely developed an edge over @OpenAI. Long context processing seems to be the key to Google's….

0

4

0

Alexey Tumanov

@alsched

11 months

RT @agrawalamey12: 🔗 Curious to learn more? Dive into our paper to explore the technical details behind Mnemosyne: ….

arxiv.org

As large language models (LLMs) evolve to handle increasingly longer contexts, serving inference requests for context lengths in the range of millions of tokens presents unique challenges. While...

0

2

0

Alexey Tumanov

@alsched

11 months

First publicly known support for LLM context of up to 10M tokens with high throughput & interactive production-grade TBT SLOs (30ms) with Mnemosyne. What would it take to pair program with GenAI on millions of LoC? Or analyze 10/110hrs of video/audio content? All precisely! <v>.

0

10

Alexey Tumanov

@alsched

11 months

Thanks, everyone who responded. We've officially concluded the ACM #sosp24 artifact evaluation process. This was a great experience, and we're eternally grateful to all the volunteer effort by the AEC reviewers.

0

2

Alexey Tumanov

@alsched

1 year

I'm serving as the #SOSP24 AEC Chair. We're still looking for artifact evaluation reviewers: .AE is indispensable to Systems Research and is a valuable experience. Grad students and early career researchers welcome! Exp. load: 2 artifacts. Self-nominate!.

sysartifacts.github.io

We are looking for members of the Artifact Evaluation Committee (AEC), who will contribute to SOSP’24 Artifact Evaluation (AE) process by reviewing submitted artifacts. AEC membership is especially...

2

13

22

Alexey Tumanov

@alsched

1 year

Let's set the standard for the interactive performance of LLMs capturing nuances of user experience. While latency/throughput tension is well known to the Systems community, latency jitter is less explored. Fluidity index & fluid token generation rate more aptly capture LLM perf.

Amey Agrawal

@agrawalamey12

1 year

🚀 Introducing Metron: Redefining LLM Serving Benchmarks! 📊. Tired of misleading metrics for LLM performance? Our new paper introduces a holistic framework that captures what really matters - the user experience! 🧠💬. #LLM #AI #Benchmark.

0

6

Alexey Tumanov

@alsched

1 year

Really proud of my PhD student's work on developing the new mechanism and policy that significantly improves tail latency performance in Large Language Model (LLM) inference without sacrificing throughput. Already received 10+ citations, source is OSS and adopted in the industry.

Amey Agrawal

@agrawalamey12

1 year

Did you ever feel that @chatgpt is done generating your response and then suddenly a burst of tokens show up? This happens when the serving system is prioritizing someone else’s request before generating your response. But why? well to reduce cost. 🧵.

0

8

Alexey Tumanov

@alsched

1 year

RT @gatech_scs: Three SCS faculty members were recognized by their students for outstanding teaching and educational impact. Congratulation….

blog.ctl.gatech.edu

The Center for Teaching and Learning (CTL) and the Office of Academic Effectiveness (OAE) are thrilled to announce the Spring 2024 Course Instructor Opinion Survey (CIOS) Honor Roll. Faculty member…

0

1

0