Eldar Kurtić Profile
Eldar Kurtić

@_EldarKurtic

Followers
757
Following
867
Media
65
Statuses
311

Principal Research Scientist @RedHat_AI & Dan Alistarh's group @ISTAustria

Joined July 2018
Don't wanna be here? Send us removal request.
@_EldarKurtic
Eldar Kurtić
7 days
Due to high demand, we’re live streaming the first official vLLM meetup in Europe from Zürich! Tune in to hear about the @vllm_project roadmap, quantization, and distributed inference from @RedHat_AI, @IBMResearch on hybrid models, and @MistralAI on their open-source
@RedHat_AI
Red Hat AI
8 days
Good news - we'll be live streaming the first official @vllm_project meetup in Europe from Zürich. Thu, Nov 6 at 11:30am ET / 8:30am PT / 5:30pm CET Hear from vLLM maintainers and contributors at @RedHat, @IBM, and @MistralAI covering quantization, hybrid models, distributed
0
2
12
@_EldarKurtic
Eldar Kurtić
24 days
You can now find compressed-tensors at its new address:
Tweet card summary image
github.com
A safetensors extension to efficiently store sparse quantized tensors on disk - vllm-project/compressed-tensors
@RedHat_AI
Red Hat AI
24 days
BIG NEWS! 🎉 Compressed Tensors is officially joining the @vllm_project! Built on top of the excellent @huggingface safetensors framework, Compressed Tensors extends it with efficient storage and management of compressed tensor data for model quantization and sparsity. Why
0
1
6
@mgoin_
Michael Goin
28 days
vLLM on TPU keeps getting better 🔥 Now we lower your PyTorch model definitions to JAX (or just write in JAX directly) for the best performance!
@vllm_project
vLLM
28 days
Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on
1
3
25
@_EldarKurtic
Eldar Kurtić
1 month
Join us for the first @vllm_project meetup in Europe!
@RedHat_AI
Red Hat AI
1 month
The first vLLM Meetup in Europe is on the calendar! 🇨🇭 📍 Zürich | 🗓️ 6 Nov 2025 Hosted by @RedHat, @IBM & @MistralAI Join core @vllm_project committers for talks on quantization, hybrid models, Mistral AI work w/ vLLM, distributed inference, and more. https://t.co/kUeGXtutvq
0
1
4
@mgoin_
Michael Goin
1 month
Happy that InferenceMAX is here because it signals a milestone for vLLM's SOTA performance on NVIDIA Blackwell! 🥳 It has been a pleasure to deeply collaborate with @nvidia in @vllm_project, and we have much more to do Read about the work we did here:
blog.vllm.ai
Introduction
@dylan522p
Dylan Patel
1 month
Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of
4
21
95
@DAlistarh
Dan Alistarh
1 month
🚀 We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels: - First study focused on MXFP4/NVFP4 PTQ for LLMs - New Micro-Rotated (MR) format and GPTQ algorithm - QuTLASS GPU kernels with up to 3.6x speedups.
1
28
151
@DAlistarh
Dan Alistarh
2 months
Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.
3
16
141
@_EldarKurtic
Eldar Kurtić
2 months
A control dashboard built for your LLM deployments
@RedHat_AI
Red Hat AI
2 months
🚀 Thrilled to announce GuideLLM v0.3.0! This release is highlighted by a brand new Web UI, containerized benchmarking, and powerful dataset preprocessing. GuideLLM GitHub: https://t.co/0iSPUOqmch (Thread 👇)
0
0
4
@DAlistarh
Dan Alistarh
2 months
We're releasing the DASLab GGUF Quantization Toolkit! 🚀 First open-source toolkit bringing GPTQ + EvoPress to @ggerganov's GGUF format, enabling heterogeneous quantization based on importance. Result: Better models at the same file size. [1/5]
4
50
270
@vllm_project
vLLM
3 months
🚀 LLM Compressor v0.7.0 is here! This release brings powerful new features for quantizing large language models, including transform support (QuIP, SpinQuant), mixed precision compression, improved MoE handling with Llama4 support, and more. Full blog:
Tweet card summary image
developers.redhat.com
LLM Compressor has recently released version 0.7.0, which introduces a range of significant enhancements designed to improve the performance of quantizing and deploying large language models. This
3
45
247
@_EldarKurtic
Eldar Kurtić
3 months
My recent talk at "Open Source @Siemens 2025" is now live on YouTube:
0
0
4
@_EldarKurtic
Eldar Kurtić
3 months
Apparently, we’re trending in SF as well
@mgoin_
Michael Goin
3 months
SF is where paper titles go to die
0
0
8
@_EldarKurtic
Eldar Kurtić
3 months
Let @mgoin_ show you what the de facto industry standard for running agents looks like with @vllm_project
@PyTorch
PyTorch
3 months
We’re looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak to experts in PyTorch & vLLM plus see you at our: ➡️ Focus talk on how the Linux Foundation and PyTorch Foundation are accelerating open source AI and
0
2
14
@RedHat_AI
Red Hat AI
4 months
[vLLM Office Hours #29] Scaling MoE with llm-d
1
11
23
@_EldarKurtic
Eldar Kurtić
4 months
You can now find GuideLLM at its new address:
Tweet card summary image
github.com
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs - vllm-project/guidellm
@RedHat_AI
Red Hat AI
4 months
BIG NEWS! 🎉 GuideLLM is officially joining the @vLLM_project! This combines vLLM's high-speed inference with a powerful, dedicated toolkit for real-world performance validation. Moving from PoC to production just got a lot more scientific. Here's how, 1/7:
0
2
11
@RedHat_AI
Red Hat AI
4 months
.@vllm_project office hours return next week! Alongside project updates from @mgoin_, vLLM committers and HPC experts @robertshaw21 + @tms_jr will share how to scale MoE models with llm-d and lessons from real world multi-node deployments. Register: https://t.co/X8hAHYR3rl
0
4
10
@_EldarKurtic
Eldar Kurtić
4 months
FP4 models and inference kernels ready for Blackwell GPUs! GPTQ and Hadamard for accuracy, and fused Hadamard for runtime. Check out more details about our work in the thread below 👇
@DAlistarh
Dan Alistarh
4 months
Announcing our early work on FP4 inference for LLMs! - QuTLASS: low-precision kernel support for Blackwell GPUs - FP-Quant: a flexible quantization harness for Llama/Qwen We reach 4x speedup vs BF16, with good accuracy through MXFP4 microscaling + fused Hadamard rotations.
0
3
12
@_EldarKurtic
Eldar Kurtić
4 months
The @huggingface folks deserve far more credit for being a pillar of open-source and still managing to push out SOTA results across the board, along with a full write-up of the entire model’s lifecycle.
@ClementDelangue
clem 🤗
4 months
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!
2
17
90