Red Hat AI
@RedHat_AI
Followers
8K
Following
1K
Media
455
Statuses
2K
Accelerating AI innovation with open platforms and community. The future of AI is open.
Joined May 2018
Red Hat AI gives teams a place to experiment with newly released models, including Mistral 3, on the day they arrive. Our Day Zero guide shows how to run Mistral 3 today using the Red Hat AI Inference Server and Red Hat OpenShift AI: https://t.co/Sr79WfbrPw Happy experimenting!
developers.redhat.com
Key takeaways
Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in ๐งต
1
6
30
Many more details are in our quick-start blog:
developers.redhat.com
AutoRound, a stateโofโtheโart postโtraining quantization (PTQ) algorithm developed by Intel, is now integrated into LLM Compressor. This collaboration delivers
0
0
1
The workflow is simple and fast. Load your model, apply the AutoRound modifier, compress the model, and serve it in @vllm_project. Support for Llama, Qwen, and other open-weight LLMs means you can start experimenting right away.
1
1
1
I can personally vouch for a dozen of these - from the genius silverware storage, to the perfect leggings with 70+ reviews, to the hands-down best pillows on earth.
49
59
773
AutoRound learns how each tensor should round and clip for the best quality possible. This delivers standout low bit performance in formats like W4A16.
1
0
1
Red Hat AI and @intel have teamed up to bring a major upgrade to low bit LLMs. AutoRound is now integrated directly into LLM Compressor, giving developers a powerful way to shrink models, boost speed, and keep accuracy. And it runs smoothly with @vllm_project. A quick ๐งต:
1
3
7
Low-bit LLM quantization doesnโt have to mean painful accuracy trade-offs or massive tuning runs. Intel's AutoRound PTQ algorithm is now integrated into LLM Compressor, producing W4A16 compressed-tensor checkpoints you can serve directly with vLLM across Intel Xeon, Gaudi, Arc
1
36
232
Thanks to the great collaboration with @vllm_project , LLM Compressor team, and @RedHat_AI team for making this happen! If you want a model with smaller-size, high accuracy, and deployed w/ vLLM, AutoRound is your best choice. ๐ and give a try:
github.com
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor - intel/au...
Low-bit LLM quantization doesnโt have to mean painful accuracy trade-offs or massive tuning runs. Intel's AutoRound PTQ algorithm is now integrated into LLM Compressor, producing W4A16 compressed-tensor checkpoints you can serve directly with vLLM across Intel Xeon, Gaudi, Arc
1
5
18
If you built agents with Llama Stack's original Agent APIs, you've probably seen that they are being deprecated in favor of the OpenAI compatible Responses API. Migrating does not require starting over. There are two practical paths you can take. Approach 1 is a
github.com
Contribute to opendatahub-io/agents development by creating an account on GitHub.
0
3
10
What if your money could do more than grow? Robin Johnโs new book shows you how your investments can honor God and serve your neighbor. Join the Good Investor Movement and start investing with purpose!
0
4
53
๐ ๐ ๐ถ๐น๐ฒ๐๐๐ผ๐ป๐ฒ ๐๐ป๐น๐ผ๐ฐ๐ธ๐ฒ๐ฑ! The ๐๐ฏ๐ง๐ฆ๐ณ๐ฆ๐ฏ๐ค๐ฆ๐๐ฑ๐ด: ๐๐ต๐ข๐ต๐ฆ ๐ฐ๐ง ๐ต๐ฉ๐ฆ ๐๐ฐ๐ฅ๐ฆ๐ญ ๐๐ฆ๐ณ๐ท๐ช๐ฏ๐จ ๐๐ฐ๐ฎ๐ฎ๐ถ๐ฏ๐ช๐ต๐ช๐ฆ๐ด newsletter from @RedHat_AI just reached ๐ญ,๐ฌ๐ฌ๐ฌ ๐๐๐ฏ๐๐ฐ๐ฟ๐ถ๐ฏ๐ฒ๐ฟ๐ in only 5 months! ๐ A huge thank-you to everyone whoโs
1
2
13
Are you running vLLM on Kubernetes and tired of guessing concurrency thresholds? This new Red Hat article walks through how to autoscale vLLM on OpenShift AI using real service metrics instead of generic request counts. KServe and KEDA work together to scale GPU model servers
developers.redhat.com
In my previous blog, How to set up KServe autoscaling for vLLM with KEDA, we explored the foundational setup of vLLM autoscaling in Open Data Hub (ODH) using KEDA and the custom metrics autoscaler
0
7
20
๐ขvLLM v0.12.0 is now available. For inference teams running vLLM at the center of their stack, this release refreshes the engine, extends long-context and speculative decoding capabilities, and moves us to a PyTorch 2.9.0 / CUDA 12.9 baseline for future work.
4
20
148
Salt? Slush? Holiday road trips? ๐ Slick Mistยฎ makes cleanup fast and satisfying. Perfect for anyone who hates winter grime.
0
1
2
It may be my first time at #AWSreInvent but it sure isnโt for @RedHat! Weโre out here in Vegas running live demos all week, including my favorite Blackjack + AI game powered by @vllm_project for model inference โก๏ธ and #ModelContextProtocol Agents ๐ค
1
10
16
Red Hat's expanded collaboration with @awscloud is empowering IT decision-makers to run high-performance, efficient #AI inference at scale with @RedHat_AI. Check it out. #AWSreInvent
redhat.com
Red Hat today announced an expanded collaboration with Amazon Web Services (AWS) to power enterprise-grade generative AI (gen AI) on AWS with Red Hat AI and AWS AI silicon.
0
1
1
Our latest PyTorch Foundation Spotlight features @RedHat's Joseph Groenenboom and Stephen Watt on the importance of optionality, open collaboration, and strong governance in building healthy and scalable AI ecosystems. In this Spotlight filmed during PyTorch Conference 2025,
2
7
58
Christianity and classical liberalismโcan they coexist? Carl Trueman and Vincent Phillip Muรฑoz debate the biggest questions facing American conservatism. Christian nationalism. Post-liberalism. Government's role in virtue. Where nationalism becomes dangerous. FULL
6
3
16
๐ Congratulations to the Mistral team on launching the Mistral 3 family! Weโre proud to share that @MistralAI, @NVIDIAAIDev, @RedHat_AI, and vLLM worked closely together to deliver full Day-0 support for the entire Mistral 3 lineup. This collaboration enabled: โข NVFP4
Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in ๐งต
8
42
493
Congrats to @MistralAI on launching the Mistral 3 family under the Apache 2.0 license. We worked together to enable upstream @vllm_project support and collaborated on creating the FP8 and NVFP4 Mistral Large 3 checkpoints through llm-compressor for efficient deployment. ๐
Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in ๐งต
0
3
14
We @RedHat_AI have partnered with Mistral to make Mistral Large 3 more accessible to the open-source community. High-quality FP8 and NVFP4 mdls, built with our llm-compressor! Expect models that are 2โ3.5x smaller with competitive accuracy across a wide range of evals.
Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in ๐งต
1
1
9
Explore some of OpenShift #AI's capabilities for scaling LLM model servers with #KServe and #vLLM. While autoscaling has its limitations, it can be a valuable tool for an IT team trying to optimize the costs of the models they are serving. https://t.co/k57HhY2Iyp
developers.redhat.com
vLLM lets you serve nearly any LLM on a wide variety of hardware. However, that hardware can be quite expensive, and you don't want to be burning money with idle GPU resources. Instead, you can
0
9
26