MLCommons @MLCommons X Profile

MLCommons

@MLCommons

Followers

3K

Following

605

Media

59

Statuses

686

Better Artificial Intelligence for Everyone

https://t.co/LPAHnlYX8W

Joined September 2020

Don't wanna be here? Send us removal request.

MLCommons

@MLCommons

1 day

1/ MLPerf Client v1.5 is here! New AI PC benchmarking features: -Windows ML support for improved GPU/NPU performance -Expanded platforms: Windows x64, Windows on Arm, macOS, Linux, plus iPad app -Experimental power/energy measurement GUI improvements out of beta Download:

github.com

What’s new in MLPerf Client v1.5 MLPerf Client v1.5 has a host of new features and capabilities, including: Acceleration updates Support for Windows ML Runtime updates from IHVs GUI updates Vari...

1

3

Scott Wasson

@scottwasson

1 day

MLPerf Client v1.5 is out now. Updates include Windows ML support, GUI improvements, and experimental support for Linux and power-efficiency test. We support a broad range of GPUs and NPUs across PCs, Macs, and iPad Pro. Free to download & run! @MLCommons

mlcommons.org

MLPerf Client v1.5 introduces Windows ML support, expanded platform coverage (including iPad), and experimental power measurement, setting a new, comprehensive standard for benchmarking AI performa...

0

2

3

MLCommons

@MLCommons

1 day

2/ Free download measures how laptops, desktops, and workstations run local AI workloads like LLMs for summarization, content creation, and code analysis. Thanks to our collaborators: AMD, Intel, Microsoft, NVIDIA, Qualcomm, and PC OEMs Learn more: https://t.co/1mzKzqNNH4

0

1

MLCommons

@MLCommons

6 days

4/4 Thanks to all submitters: @AMD, @ASUS_OFFICIAL, @Cisco, @DataCrunch_io, @Dell, @GigaComputing, @HPE, @krai4ai, @LambdaAPI, @Lenovo, @MangoBoost_Inc, @MiTACcomputing, @nebiusai, @nvidia, @Oracle, @QuantaQCT, @Supermicro, @uflorida, @WiwynnCorp

0

4

MLCommons

@MLCommons

6 days

3/4 #GenAI benchmarks show performance improvements outpacing Moore's Law, with 24% increase in #llama 2 70B LoRA submissions.

1

3

MLCommons

@MLCommons

6 days

2/4 Two new benchmarks debut: Llama 3.1 8B: #LLM pretraining (replaces BERT) Flux.1: Transformer-based text-to-image (replaces Stable Diffusion v2)

1

3

MLCommons

@MLCommons

6 days

MLPerf Training v5.1 results are live! Record participation: 20 organizations submitted 65 unique systems featuring 12 different accelerators. Multi-node submissions increased 86% over last year, showing the industry's focus on scale. https://t.co/PBO3v98PwZ #MLPerf 1/4

1

3

8

MLCommons

@MLCommons

8 days

Don’t miss #MLCommons Endpoints in San Diego, Dec 1–2! Learn, connect, and shape the future of AI with top experts at @Qualcomm Hall. 🗓 Dec 1–2 | 🎟 Free tickets available now! https://t.co/HoLgXEawi3 #AI #MachineLearning #SanDiego

0

1

3

MLCommons

@MLCommons

14 days

1.1M tokens/sec achieved using MLPerf Inference v5.1 🚀 Microsoft Azure's result on ND GB300 v6 shows why standardized benchmarks matter: transparent, reproducible performance measurement across the industry. Explore all results →

mlcommons.org

MLCommons ML benchmarks help balance the benefits and risks of AI through quantitative tools that guide responsible AI development.

Satya Nadella

@satyanadella

15 days

1.1M tokens/sec on just one rack of GB300 GPUs in our Azure fleet. An industry record made possible by our longstanding co-innovation with NVIDIA and expertise of running AI at production scale! https://t.co/1qLvoS2z70

0

Scaleway

@Scaleway

18 days

Every week, a new model claims to reason, code, or plan better than the last. But how do we really know what’s true? From ImageNet to today’s multimodal and reasoning systems, benchmarks have been the invisible engine of AI advancement, making progress measurable, reproducible,

2

1

2

MLCommons

@MLCommons

18 days

"Meanwhile, with its 125+ founding Members and Affiliates all working towards a shared goal of open industry-standard benchmarks, MLCommons stands as perhaps the best example of industry-wide collaboration"

Scaleway

@Scaleway

18 days

Every week, a new model claims to reason, code, or plan better than the last. But how do we really know what’s true? From ImageNet to today’s multimodal and reasoning systems, benchmarks have been the invisible engine of AI advancement, making progress measurable, reproducible,

0

1

Flower

@flwrlabs

21 days

🧠 Upcoming Talk at the @MLCommons Medical Working Group: Federated AI using Flower Patrick Foley, Principal Engineer and Dimitris Stripelis (@dstripelis_), ML Engineer, will be presenting this Wednesday, October 29, at 10am PT. The Flower Team will cover the fundamentals of

0

3

5

MLCommons

@MLCommons

19 days

MLPerf Training v5.1 now features Llama 3.1 8B, a new pretraining benchmark! This brings modern LLM evaluation to single-node systems, lowering the barrier to entry while maintaining relevance to current AI development. https://t.co/mMAvuHEMOG #MLPerf #LLaMA3_1 #AI

mlcommons.org

MLPerf Training v5.1 introduces the Llama 3.1 8B benchmark, offering an accessible way to evaluate modern LLM pretraining on single-node systems, replacing the outdated BERT model.

0

1

MLCommons

@MLCommons

20 days

🚀MLCommons MLPerf Training v5.1 introduces Flux.1, a 11.9B parameter transformer model for text-to-image generation. Setting a new standard, replacing SDv2, reflecting the latest in generative AI. Read all about it! https://t.co/29ucM3GOSW #MLPerf #AI #GenerativeAI #Flux1

0

2

MLCommons

@MLCommons

1 month

The next generation of AI won't just be innovative—it'll be resilient. Access the benchmark and full findings: https://t.co/uEQhisP9cU Join the conversation! 6/6 #AIRiskandReliability #AISecurity

mlcommons.org

MLCommons AI jailbreak benchmark introduces the Resilience Gap metric, the first industry standard to measure AI safety under attack. Learn how it protects critical systems.

0

1

MLCommons

@MLCommons

1 month

Why this matters: → Developers get standardized metrics to find and fix vulnerabilities → Policymakers get transparent, reproducible data → Users get systems they can actually trust We're making hidden risks visible and measurable. 5/6

1

0

MLCommons

@MLCommons

1 month

The Jailbreak Benchmark v0.5 tests AI resilience across: -Text-to-text scenarios -Multimodal scenarios -12 hazard categories (violent crimes, CBRNE, child exploitation, suicide/self-harm, and more) Built on our AILuminate safety benchmark methodology. 4/6

1

0

MLCommons

@MLCommons

1 month

What is jailbreaking? It's when users manipulate AI systems to bypass safety filters and produce harmful, unintended, or policy-violating content. It's not theoretical. It's happening now. 3/6

1

0

MLCommons

@MLCommons

1 month

The gap between AI safety and security is real—and dangerous. 89% of models showed degraded safety performance when exposed to common jailbreak techniques. As AI powers healthcare, finance, and critical infrastructure, this vulnerability can't be ignored. 2/6

1

0

MLCommons

@MLCommons

1 month

🚨 NEW: We tested 39 AI models for security vulnerabilities. Not a single one was as secure as it was "safe." Today, @MLCommons is releasing the industry's first standardized jailbreak benchmark. Here's what we found 🧵 1/6

1

3