Eldar Kurtić
@_EldarKurtic
Followers
757
Following
867
Media
65
Statuses
311
Principal Research Scientist @RedHat_AI & Dan Alistarh's group @ISTAustria
Joined July 2018
Due to high demand, we’re live streaming the first official vLLM meetup in Europe from Zürich! Tune in to hear about the @vllm_project roadmap, quantization, and distributed inference from @RedHat_AI, @IBMResearch on hybrid models, and @MistralAI on their open-source
Good news - we'll be live streaming the first official @vllm_project meetup in Europe from Zürich. Thu, Nov 6 at 11:30am ET / 8:30am PT / 5:30pm CET Hear from vLLM maintainers and contributors at @RedHat, @IBM, and @MistralAI covering quantization, hybrid models, distributed
0
2
12
You can now find compressed-tensors at its new address:
github.com
A safetensors extension to efficiently store sparse quantized tensors on disk - vllm-project/compressed-tensors
BIG NEWS! 🎉 Compressed Tensors is officially joining the @vllm_project! Built on top of the excellent @huggingface safetensors framework, Compressed Tensors extends it with efficient storage and management of compressed tensor data for model quantization and sparsity. Why
0
1
6
vLLM on TPU keeps getting better 🔥 Now we lower your PyTorch model definitions to JAX (or just write in JAX directly) for the best performance!
Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on
1
3
25
Join us for the first @vllm_project meetup in Europe!
The first vLLM Meetup in Europe is on the calendar! 🇨🇭 📍 Zürich | 🗓️ 6 Nov 2025 Hosted by @RedHat, @IBM & @MistralAI Join core @vllm_project committers for talks on quantization, hybrid models, Mistral AI work w/ vLLM, distributed inference, and more. https://t.co/kUeGXtutvq
0
1
4
Happy that InferenceMAX is here because it signals a milestone for vLLM's SOTA performance on NVIDIA Blackwell! 🥳 It has been a pleasure to deeply collaborate with @nvidia in @vllm_project, and we have much more to do Read about the work we did here:
blog.vllm.ai
Introduction
Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of
4
21
95
🚀 We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels: - First study focused on MXFP4/NVFP4 PTQ for LLMs - New Micro-Rotated (MR) format and GPTQ algorithm - QuTLASS GPU kernels with up to 3.6x speedups.
1
28
151
Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.
3
16
141
A control dashboard built for your LLM deployments
🚀 Thrilled to announce GuideLLM v0.3.0! This release is highlighted by a brand new Web UI, containerized benchmarking, and powerful dataset preprocessing. GuideLLM GitHub: https://t.co/0iSPUOqmch (Thread 👇)
0
0
4
We're releasing the DASLab GGUF Quantization Toolkit! 🚀 First open-source toolkit bringing GPTQ + EvoPress to @ggerganov's GGUF format, enabling heterogeneous quantization based on importance. Result: Better models at the same file size. [1/5]
4
50
270
🚀 LLM Compressor v0.7.0 is here! This release brings powerful new features for quantizing large language models, including transform support (QuIP, SpinQuant), mixed precision compression, improved MoE handling with Llama4 support, and more. Full blog:
developers.redhat.com
LLM Compressor has recently released version 0.7.0, which introduces a range of significant enhancements designed to improve the performance of quantizing and deploying large language models. This
3
45
247
My recent talk at "Open Source @Siemens 2025" is now live on YouTube:
0
0
4
Apparently, we’re trending in SF as well
0
0
8
Let @mgoin_ show you what the de facto industry standard for running agents looks like with @vllm_project
We’re looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak to experts in PyTorch & vLLM plus see you at our: ➡️ Focus talk on how the Linux Foundation and PyTorch Foundation are accelerating open source AI and
0
2
14
You can now find GuideLLM at its new address:
github.com
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs - vllm-project/guidellm
BIG NEWS! 🎉 GuideLLM is officially joining the @vLLM_project! This combines vLLM's high-speed inference with a powerful, dedicated toolkit for real-world performance validation. Moving from PoC to production just got a lot more scientific. Here's how, 1/7:
0
2
11
.@vllm_project office hours return next week! Alongside project updates from @mgoin_, vLLM committers and HPC experts @robertshaw21 + @tms_jr will share how to scale MoE models with llm-d and lessons from real world multi-node deployments. Register: https://t.co/X8hAHYR3rl
0
4
10
FP4 models and inference kernels ready for Blackwell GPUs! GPTQ and Hadamard for accuracy, and fused Hadamard for runtime. Check out more details about our work in the thread below 👇
Announcing our early work on FP4 inference for LLMs! - QuTLASS: low-precision kernel support for Blackwell GPUs - FP-Quant: a flexible quantization harness for Llama/Qwen We reach 4x speedup vs BF16, with good accuracy through MXFP4 microscaling + fused Hadamard rotations.
0
3
12
The @huggingface folks deserve far more credit for being a pillar of open-source and still managing to push out SOTA results across the board, along with a full write-up of the entire model’s lifecycle.
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!
2
17
90