#BenchMark X Hashtag

Explore tweets tagged as #BenchMark

Vals AI

@_valsai

2 hours

Stop vibe checking your vibe code! We just released Vibe Code Bench: the first benchmark that tests whether AI models can actually build complete web applications from scratch. Featured today in @Inc (1/6)

15

28

85

Walter Laurito

@walterlaurito

3 hours

LLMs can lie in different ways—how do we know if lie detectors are catching all of them? We introduce LIARS’ BENCH, a new benchmark containing over 72,000 on-policy lies and honest responses to evaluate lie detectors for LLMs, made of 7 different datasets.

1

2

10

Alex

@MissBenchmark

19 hours

Watching everyone somehow defend Gretchen in this #RHOC reunion.

11

69

1K

dupontregistry

@duPontREGISTRY

2 days

1992 Ferrari F40 | Asking Price: $3,250,990 The Ferrari F40 remains the benchmark for pure driver engagement. With its featherweight chassis, twin-turbo punch, and iconic rear wing, it represents the essence of Ferrari’s golden era and continues to command respect from

16

151

1K

Shadrack Amonoo Crabe 👁‍🗨

@ShadrackAmonooC

3 days

The “Benchmark” 🐐 👑💥👏🏿

10

124

3K

Manasi Sharma

@ManasiSharma_

9 days

🚀New @scale_AI paper: 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗥𝘂𝗯𝗿𝗶𝗰𝘀, a benchmark for evaluating Deep Research (DR) agents. Even top agents like Gemini & OpenAI DR achieve <𝟲𝟴% 𝗿𝘂𝗯𝗿𝗶𝗰 𝗰𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝗰𝗲. We built 𝟮.𝟱𝗞+ expert rubrics with 𝟮.𝟴𝗞+ hrs of human labor to measure why.

12

30

198

Benchmark

@BenchmarkEmail

4 hours

Real marketers. Real results. See what our users are saying about how Benchmark Email helps them send smarter, faster, and better campaigns. ✨ Because your success is the best story we can tell. #benchmarkemail #emailmarketing #customerlove #emailstrategy

0

Crypto Holding™ 💎

@Crypto_Holding_

8 days

🚨 BREAKING: #Binance is #1 in #CoinDesk’s 2025 Exchange Benchmark — the ONLY exchange with 90+ scores in BOTH spot (93.4) & derivatives (93.65) , earning AA rating ! 💪 Leads in Market Quality, Security & Transparency. 26% global spot volume. Deeper liquidity = tighter spreads,

92

596

922

Altcoin Vector

@altcoinvector

2 hours

If $WIF is a behavior benchmark for the broader altcoin market, then Alts are sitting exactly on support, holding their April-bottom structure despite BTC’s heavy sell pressure. Alts, which were nuked in October, are resisting the final rounds far better than expected. 🧵

8

11

83

#JaiBabu 🦅

@Urs_Ramchandra

10 days

Real ayyagaru. Benchmark for heroism and elevation in TFI started here ⛓️ #Shiva4K

12

215

3K

alphaXiv

@askalphaxiv

3 days

Introducing Gemini 3 Pro for understanding research papers 🚀 Highlight any section of a paper to ask questions and “@” other papers for quick context, comparisons, and benchmark references

42

129

913

Tharun Billa

@Tharun_billa_

1 day

Next Level Intensity The Hummer Fight Sequence Sets A New Benchmark In Cinematic Action 😮‍💨🔥

7

150

1K

Craig Fuller 🛩🚛🚂⚓️

@FreightAlley

7 days

Freight continues it's epic collapse, with the Cass Shipment Index giving off some of the most significant warning signs about the state of the goods economy. The Cass Shipment Index, the benchmark freight index, has dropped to October 2009 levels, the height of the Great

45

255

853

Dr. Datta M.D. (AIIMS Delhi)

@DrDatta_AIIMS

2 days

🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning

68

158

1K

Muaxh03

@muaxh03

6 hours

Asus is lazy. I showed them full video proof, live footage of the crash, dump files and every detail possible. My 5090 keeps crashing randomly and all they do is launch a 15 minute benchmark and send it back. This has been going on for 3 months. I tell them it’s not fixed so they

7

4

120

Sam Hogan 🇺🇸

@samhogan

3 days

The founder of Google flying his $150M blimp over San Francisco on the day Gemini 3 beats nearly every model benchmark is the exact type of big baller energy this city loves. “I’m still daddy” - Sergey Brin, probably

80

107

4K

NPF-FPN

@npffpn

47 minutes

Surrey residents deserve clear facts. According to the Surrey Police Service (SPS), 608 SPS officers are currently deployed. Combined with remaining RCMP officers, Surrey has the highest number in the city’s history above the benchmark agreed upon by both the Province and the

0

1

5

Derya Unutmaz, MD

@DeryaTR_

3 days

Gemini 3.0 Pro absolutely dominates every benchmark! The jump from 2.5 is nuts! Its scores on the most difficult benchmarks suggest this is essentially baby AGI! Humanity Last exam: 37.5% ARC-AGI-2: 31.1% LiveCodeBench Pro: 2439 Math arena apex : 23.4% Simple QA: 72.1%

99

88

939

Artificial Analysis

@ArtificialAnlys

4 days

Announcing AA-Omniscience, our new benchmark for knowledge and hallucination across >40 topics, where all but three models are more likely to hallucinate than give a correct answer Embedded knowledge in language models is important for many real world use cases. Without

42

116

686