Eldar Kurtić @_EldarKurtic X Profile

Eldar Kurtić

@_EldarKurtic

Followers

757

Following

867

Media

65

Statuses

311

Principal Research Scientist @RedHat_AI & Dan Alistarh's group @ISTAustria

Joined July 2018

Don't wanna be here? Send us removal request.

Eldar Kurtić

@_EldarKurtic

7 days

Due to high demand, we’re live streaming the first official vLLM meetup in Europe from Zürich! Tune in to hear about the @vllm_project roadmap, quantization, and distributed inference from @RedHat_AI, @IBMResearch on hybrid models, and @MistralAI on their open-source

Red Hat AI

@RedHat_AI

8 days

Good news - we'll be live streaming the first official @vllm_project meetup in Europe from Zürich. Thu, Nov 6 at 11:30am ET / 8:30am PT / 5:30pm CET Hear from vLLM maintainers and contributors at @RedHat, @IBM, and @MistralAI covering quantization, hybrid models, distributed

0

2

12

Eldar Kurtić

@_EldarKurtic

24 days

You can now find compressed-tensors at its new address:

github.com

A safetensors extension to efficiently store sparse quantized tensors on disk - vllm-project/compressed-tensors

Red Hat AI

@RedHat_AI

24 days

BIG NEWS! 🎉 Compressed Tensors is officially joining the @vllm_project! Built on top of the excellent @huggingface safetensors framework, Compressed Tensors extends it with efficient storage and management of compressed tensor data for model quantization and sparsity. Why

0

1

6

Michael Goin

@mgoin_

28 days

vLLM on TPU keeps getting better 🔥 Now we lower your PyTorch model definitions to JAX (or just write in JAX directly) for the best performance!

vLLM

@vllm_project

28 days

Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on

1

3

25

Eldar Kurtić

@_EldarKurtic

1 month

Join us for the first @vllm_project meetup in Europe!

Red Hat AI

@RedHat_AI

1 month

The first vLLM Meetup in Europe is on the calendar! 🇨🇭 📍 Zürich | 🗓️ 6 Nov 2025 Hosted by @RedHat, @IBM & @MistralAI Join core @vllm_project committers for talks on quantization, hybrid models, Mistral AI work w/ vLLM, distributed inference, and more. https://t.co/kUeGXtutvq

0

1

4

Michael Goin

@mgoin_

1 month

Happy that InferenceMAX is here because it signals a milestone for vLLM's SOTA performance on NVIDIA Blackwell! 🥳 It has been a pleasure to deeply collaborate with @nvidia in @vllm_project, and we have much more to do Read about the work we did here:

blog.vllm.ai

Introduction

Dylan Patel

@dylan522p

1 month

Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of

4

21

95

Dan Alistarh

@DAlistarh

1 month

🚀 We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels: - First study focused on MXFP4/NVFP4 PTQ for LLMs - New Micro-Rotated (MR) format and GPTQ algorithm - QuTLASS GPU kernels with up to 3.6x speedups.

1

28

151

Dan Alistarh

@DAlistarh

2 months

Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.

3

16

141

Eldar Kurtić

@_EldarKurtic

2 months

A control dashboard built for your LLM deployments

Red Hat AI

@RedHat_AI

2 months

🚀 Thrilled to announce GuideLLM v0.3.0! This release is highlighted by a brand new Web UI, containerized benchmarking, and powerful dataset preprocessing. GuideLLM GitHub: https://t.co/0iSPUOqmch (Thread 👇)

0

4

Dan Alistarh

@DAlistarh

2 months

We're releasing the DASLab GGUF Quantization Toolkit! 🚀 First open-source toolkit bringing GPTQ + EvoPress to @ggerganov's GGUF format, enabling heterogeneous quantization based on importance. Result: Better models at the same file size. [1/5]

4

50

270

vLLM

@vllm_project

3 months

🚀 LLM Compressor v0.7.0 is here! This release brings powerful new features for quantizing large language models, including transform support (QuIP, SpinQuant), mixed precision compression, improved MoE handling with Llama4 support, and more. Full blog:

developers.redhat.com

LLM Compressor has recently released version 0.7.0, which introduces a range of significant enhancements designed to improve the performance of quantizing and deploying large language models. This

3

45

247

Eldar Kurtić

@_EldarKurtic

3 months

My recent talk at "Open Source @Siemens 2025" is now live on YouTube:

0

4

Eldar Kurtić

@_EldarKurtic

3 months

Apparently, we’re trending in SF as well

Michael Goin

@mgoin_

3 months

SF is where paper titles go to die

0

8

Eldar Kurtić

@_EldarKurtic

3 months

Let @mgoin_ show you what the de facto industry standard for running agents looks like with @vllm_project

PyTorch

@PyTorch

3 months

We’re looking forward to participating in the Agentic AI Summit 2025 by @BerkeleyRDI on August 2nd. Find us onsite to speak to experts in PyTorch & vLLM plus see you at our: ➡️ Focus talk on how the Linux Foundation and PyTorch Foundation are accelerating open source AI and

0

2

14

Red Hat AI

@RedHat_AI

4 months

[vLLM Office Hours #29] Scaling MoE with llm-d

1

11

23

Eldar Kurtić

@_EldarKurtic

4 months

You can now find GuideLLM at its new address:

github.com

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs - vllm-project/guidellm

Red Hat AI

@RedHat_AI

4 months

BIG NEWS! 🎉 GuideLLM is officially joining the @vLLM_project! This combines vLLM's high-speed inference with a powerful, dedicated toolkit for real-world performance validation. Moving from PoC to production just got a lot more scientific. Here's how, 1/7:

0

2

11

Red Hat AI

@RedHat_AI

4 months

.@vllm_project office hours return next week! Alongside project updates from @mgoin_, vLLM committers and HPC experts @robertshaw21 + @tms_jr will share how to scale MoE models with llm-d and lessons from real world multi-node deployments. Register: https://t.co/X8hAHYR3rl

0

4

10

Eldar Kurtić

@_EldarKurtic

4 months

FP4 models and inference kernels ready for Blackwell GPUs! GPTQ and Hadamard for accuracy, and fused Hadamard for runtime. Check out more details about our work in the thread below 👇

Dan Alistarh

@DAlistarh

4 months

Announcing our early work on FP4 inference for LLMs! - QuTLASS: low-precision kernel support for Blackwell GPUs - FP-Quant: a flexible quantization harness for Llama/Qwen We reach 4x speedup vs BF16, with good accuracy through MXFP4 microscaling + fused Hadamard rotations.

0

3

12

Eldar Kurtić

@_EldarKurtic

4 months

The @huggingface folks deserve far more credit for being a pillar of open-source and still managing to push out SOTA results across the board, along with a full write-up of the entire model’s lifecycle.

clem 🤗

@ClementDelangue

4 months

We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!

2

17

90