addvin Profile Banner
brian stevens Profile
brian stevens

@addvin

Followers
5K
Following
172
Media
59
Statuses
748

CEO, Neural Magic. Ex VP, CTO of Google Cloud and EVP, CTO of Red Hat, RPI and UNH alumn, marathoner, ironman, ADK MT 46er.

Joined January 2009
Don't wanna be here? Send us removal request.
@addvin
brian stevens
6 months
Introducing llm-d on stage at Red Hat Summit was truly a privilege ...
@RedHat_AI
Red Hat AI
6 months
LLM inference is too slow, too expensive, and too hard to scale. 🚨 Introducing llm-d, a Kubernetes-native distributed inference framework, to change that—using vLLM (@vllm_project), smart scheduling, and disaggregated compute. Here’s how it works—and how you can use it today:
1
8
32
@SemiAnalysis_
SemiAnalysis
17 days
The @RedHat_AI team contributes a lot to vLLM and does amazing work for the open-source community. Great to see vLLM performing so well compared to TRT-LLM on H200! vLLM comes pretty close to B200, with the @NVIDIAAI team working on closing the gap for GPTOSS within the next
@RedHat_AI
Red Hat AI
17 days
InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the @vllm_project community over the past two weeks:
3
16
101
@RedHat_AI
Red Hat AI
17 days
InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the @vllm_project community over the past two weeks:
1
8
42
@RedHat_AI
Red Hat AI
2 months
4 tracks. 12 sessions. 1 day of learning. Join us on Oct. 16 for Red Hat AI Day of Learning, a free virtual event for developers, engineers & practitioners. Tracks: ⚔ Fast & efficient inference šŸŽÆ Model customization šŸ¤– Agentic AI 🌐 Scaling AI over hybrid cloud Sessions
redhat.com
Join us for the Red Hat AI Day of Learning, a virtual event designed for developers, engineers, and technical practitioners who want to deepen their expertise in AI inference, model optimization,...
1
17
39
@RedHat_AI
Red Hat AI
2 months
Qwen3-Next dropped yesterday and you can run it with Red Hat AI today. āœ… Day-zero support in vLLM āœ… Day-one deployment with Red Hat AI Step-by-step guide: https://t.co/ZjLJyfmMJm The future of AI is open.
Tweet card summary image
developers.redhat.com
Key takeaways
0
5
18
@RedHat_AI
Red Hat AI
5 months
Thanks to the @lmcache team for joining forces with Red Hat on llm-d! llm-d is a new open source project for scalable, efficient distributed LLM inference with @vllm_project. Learn more about our collaboration here:
Tweet card summary image
blog.lmcache.ai
We’re delighted to announce that LMCache is joining forces with Red Hat and other industry leaders on some exciting open source project collaborations. LMCache has been selected to be a core compon...
0
8
28
@sparkycollier
Mark Collier ęŸÆē†ę€€
6 months
Really excited to see the emergence of llm-d @addvin ! Inference is the biggest workload in human history and the open source tools need to keep evolving to serve it
@NVIDIAAIDev
NVIDIA AI Developer
6 months
The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI. As generative and agentic AI continue to evolve,
0
2
11
@NVIDIAAIDev
NVIDIA AI Developer
6 months
The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI. As generative and agentic AI continue to evolve,
1
17
34
@addvin
brian stevens
7 months
And was great to see the Red Hat and Google effort announced by my friend the brilliant Amin Vahdat.
@woosuk_k
Woosuk Kwon
7 months
Huge congrats to all the @googlecloud and @RedHat_AI team members who drove this effort!
0
0
6
@RedHat_AI
Red Hat AI
9 months
DeepSeek’s Open Source Week drops A LOT of exciting goodies! We’re hosting vLLM Office Hours tomorrow—learn what they are, how they integrate with vLLM, & ask questions! Date: Thursday, Thu, Feb 27 Time: 2PM ET / 11AM PT Register: https://t.co/uCYCpHLH87 #DeepSeek #AI
0
2
9
@matthicksj
Matt Hicks
10 months
At @RedHat, we believe the future of AI is open. That's why I'm incredibly excited about our acquisition of @NeuralMagic. Together, we're furthering our commitment to our customers and the open source community to deliver on the future of AI—and that starts today.
@RedHat
Red Hat
10 months
Today, Red Hat completed the acquisition of @NeuralMagic, a pioneer in software and algorithms that accelerate #GenAI inference workloads. Read how we are accelerating our vision for #AI’s future: https://t.co/PkGfC48tAt.
0
28
82
@addvin
brian stevens
10 months
Today it become official, Neural Magic now a part of Red Hat.
@RedHat
Red Hat
10 months
Today, Red Hat completed the acquisition of @NeuralMagic, a pioneer in software and algorithms that accelerate #GenAI inference workloads. Read how we are accelerating our vision for #AI’s future: https://t.co/PkGfC48tAt.
2
8
38
@RedHat_AI
Red Hat AI
11 months
If you are at #NeurIPS2024 this week, stop by the Neural Magic booth #307 and talk to us about the @vllm_project! vLLM core committer @mgoin_ will be there, ready to hear your ideas and share them with the team. The best feature requests always come from in-person chats!
1
1
7
@scaleml
Scale ML
11 months
For our last seminar of the year we will end with Lucas Wilkinson from @neuralmagic presenting! Machete: a cutting-edge mixed-input GEMM GPU kernel targeting NVIDIA Hopper GPUs Time: Dec 4, 3pm EST Sign up via https://t.co/EvbCJnxpr8 to join our mailing list for the zoom link
0
4
17
@addvin
brian stevens
1 year
I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc. At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at
17
35
128
@addvin
brian stevens
1 year
Quantization of LLM models is critical for efficient deployments. But how to avoid any negative impact of quantization on model capability? Our latest research across Llama variants will serve as a great guide.
@_EldarKurtic
Eldar Kurtić
1 year
vLLM + Quantization: We investigated impact of quantization across all Llama sizes to come up with a set of practical guidelines for deployment across various use cases and GPU architectures in @vllm_project. There are some interesting findings relative to "well-known" things: šŸ‘‡
0
1
4
@addvin
brian stevens
1 year
Quantized versions of Llama-3.2 now available ...
@_EldarKurtic
Eldar Kurtić
1 year
@AIatMeta just released new Llama-3.2 models (~3h ago), and as usual, our team at @neuralmagic was quick to quantize them to FP8 with llm-compressor for even more efficient inference with vLLM! 1. https://t.co/X8PoYbd9DV 2.
0
0
0
@RedHat_AI
Red Hat AI
1 year
Sparse-Marlin is here and integrated into @vllm_project! This GPU-optimized kernel accelerates matrix multiplication with 4-bit quantized weights and 2:4 sparsity, achieving 5.3x speedups on NVIDIA GPUs (Ampere/Ada). Maintains efficiency with batch sizes up to 32. Links below.
2
21
96
@addvin
brian stevens
1 year
Very cool!
@charles_irl
Charles šŸŽ‰ Frye
1 year
Will share more technical details in the coming days. Initial estimate is we got @neuralmagic's FP8 of LLaMA 3.1 405B on @vllm_project to process 2,500,000 tokens per hour per 8xA100-80 node on @modal_labs, simulating 1995 at a rate of 10 days per hour (240x real time).
0
1
11
@RedHat_AI
Red Hat AI
1 year
vLLM is the leading open-source inference server with 24k GitHub stars. Join us for bi-weekly vLLM Office Hours to learn about the project, get involved, and provide feedback. Here's what to expect this week: 1. Get the latest updates from Neural Magic’s Engineering Lead and top
0
3
11