#benchmarks X Hashtag

Explore tweets tagged as #benchmarks

Banh Bao 🟣

@BanhBao565

13 minutes

🃏 House of TEN: Where AI doesn’t just compute — it bluffs 🎭. I joined House of TEN @tenprotocol thinking it was just poker. But turns out… I walked into a bluffing arena for AI agents. And it’s brilliant. Here’s the deal:. 🧠 Most AI benchmarks?.They’re like school exams —

3

0

5

Hao AI Lab

@haoailab

8 days

Grok-4 has been thoroughly evaluated on math and coding benchmarks, but its performance in gaming environments is untested. We evaluate Grok-4 on the lmgame bench and find that it emerges as a leading model with superior gaming capabilities, ranking #2 on our leaderboard. 🥈. In

2

14

45

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

57 minutes

Seed-Prover’s 30 / 42-point silver-medal performance at IMO 2025. - Fully solved 4/6 problems.- Included 3-day proofs for P3 (2000-line Lean) & P4 (4000-line Lean).- Geometry problem solved in 2 seconds via Seed-Geometry. New SOTA Across Benchmarks.- 100% MiniF2F-valid.- 99%

1

0

8

NEARWEEK

@NEARWEEK

2 hours

“Your brain is the agent. Your muscle is the smart contract.”. @ilblackdragon broke down the June wins across the @near_ai ecosystem highlighting @proximityfi's Shade Agent Sandbox, @PublicAI_'s $10M raise for human-in-the-loop labeling, @SilverstreamAI's automation benchmarks,

27

3

22

Backpack for Laravel

@laravelbackpack

6 days

#Laravel Tip. Did you know. Laravel has a Benchmark class that lets you measure the time of any task:

3

18

124

Sergey

@SergeyCYW

3 days

The Rule of 40 is a pivotal financial metric used to evaluate the performance of SaaS and other growth-oriented software companies. It serves as a benchmark that balances revenue growth and profitability, providing a comprehensive snapshot of a company's overall financial health.

1

7

33

Fabius DeFi

@FabiusDefi

17 hours

➠ Everyone’s building agents, .but no one’s asking:. – Does this thing actually work?.– Is the dev team legit?.– Should I trust it with my time or tokens?. I think @InferiumAI is building something that’s massively underrated. A benchmark layer for the agent economy – the AI

44

8

121

🦚Onobaby🦚

@vibequeen01

2 days

BitTorrent: Redefining the Future of Decentralized Infrastructure. As the world’s fastest decentralized, driverless system, BitTorrent is setting a new benchmark for Web3 speed, scalability, and innovation. Powered by BTFS, it’s revolutionizing data sharing and storage for the

11

74

209

Karl Pertsch

@KarlPertsch

8 days

We’re organizing the RoboArena Challenge at CoRL this year!.Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖.Sign up, even if you don’t have a robot!. More details in 🧵👇

2

17

112

François Chollet

@fchollet

5 days

Today we're releasing a developer preview of our next-gen benchmark, ARC-AGI-3. The goal of this preview, leading up to the full version launch in early 2026, is to collaborate with the community. We invite you to provide feedback to help us build the most robust and effective

209

979

3K

Ray Amjad

@theramjad

4 hours

Qwen3-Coder by @Alibaba_Qwen came out a few hours ago, and unfortunately, in a production codebase, underperforms when compared to Kimi K2 by @Kimi_Moonshot. That's despite performing better on the benchmarks. I think it's becoming increasingly clear models are "benchmark

2

0

2

Regional Manager/Hyderabad Region(City operations)

@rmhrtgsrtc

1 hour

🔥 A new benchmark in public service!.Telangana women have availed 200 crore zero-fare journeys under the transformative Mahalaxmi free travel scheme 🚌🌼.💷 ₹6680 crore saved.📆 As on 23.07.2025.A true leap for equality, access, and dignity. #TGSRTC #MahalakshmiScheme

0

12

17

Pirat_Nation 🔴

@Pirat_Nation

8 days

Grok 4 and Grok 4 Heavy Benchmark Results

10

8

116

91mobiles

@91mobiles

2 hours

We skipped the benchmarks and went straight to real gaming. COD Mobile, Genshin, BGMI: one phone came out on top🎮. Watch our Reno14 Pro vs iQOO Neo 10 vs vivo V50 gaming test!

1

13

Financial Times

@FT

30 minutes

Breaking news: Former UBS and Citigroup trader Tom Hayes has emerged victorious in his decade-long battle to clear his name for rigging Libor benchmark rates after the UK’s highest court quashed his conviction

11

19

40

Shashank

@shawshank_v

2 days

Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵

11

49

251

definikola

@definikola

2 days

we're seeing the largest funding rate divergence from benchmark interest rates in the past year. > funding - 12%.> borrow - 5.5%.> supply - 3.8%. been up-only for over a week now

2

1

11

Eric Chan 😻 ERC-520 Maneki-Meow

@canmasu

4 hours

What a magical number — 520 attendees!.#FOMO MY — AI. Art. Web3 — has set a new benchmark for the Malaysian Web3 community as the largest crowd to date. Huge thanks to our panelists, AI artists, guests, community partners, and of course, @Olofvw and @rene05. Together, we made

1

6

Max Weinbach

@MaxWinebach

6 days

Apple published benchmarks for their on-device model in OS 26 platforms vs. other SLMs. Seems to do VERY well!. The server model also seems to be competitive and very good!. Both server and on-device have great quantization optimizations, losing not much quality from 16 to 2 bit

8

10

147