Explore tweets tagged as #benchmarks
@BanhBao565
Banh Bao 🟣
13 minutes
🃏 House of TEN: Where AI doesn’t just compute — it bluffs 🎭. I joined House of TEN @tenprotocol thinking it was just poker. But turns out… I walked into a bluffing arena for AI agents. And it’s brilliant. Here’s the deal:. 🧠 Most AI benchmarks?.They’re like school exams —
3
0
5
@haoailab
Hao AI Lab
8 days
Grok-4 has been thoroughly evaluated on math and coding benchmarks, but its performance in gaming environments is untested. We evaluate Grok-4 on the lmgame bench and find that it emerges as a leading model with superior gaming capabilities, ranking #2 on our leaderboard. 🥈. In
Tweet media one
Tweet media two
2
14
45
@gm8xx8
𝚐𝔪𝟾𝚡𝚡𝟾
57 minutes
Seed-Prover’s 30 / 42-point silver-medal performance at IMO 2025. - Fully solved 4/6 problems.- Included 3-day proofs for P3 (2000-line Lean) & P4 (4000-line Lean).- Geometry problem solved in 2 seconds via Seed-Geometry. New SOTA Across Benchmarks.- 100% MiniF2F-valid.- 99%
Tweet media one
1
0
8
@NEARWEEK
NEARWEEK
2 hours
“Your brain is the agent. Your muscle is the smart contract.”. @ilblackdragon broke down the June wins across the @near_ai ecosystem highlighting @proximityfi's Shade Agent Sandbox, @PublicAI_'s $10M raise for human-in-the-loop labeling, @SilverstreamAI's automation benchmarks,
27
3
22
@laravelbackpack
Backpack for Laravel
6 days
#Laravel Tip. Did you know. Laravel has a Benchmark class that lets you measure the time of any task:
Tweet media one
3
18
124
@SergeyCYW
Sergey
3 days
The Rule of 40 is a pivotal financial metric used to evaluate the performance of SaaS and other growth-oriented software companies. It serves as a benchmark that balances revenue growth and profitability, providing a comprehensive snapshot of a company's overall financial health.
Tweet media one
1
7
33
@FabiusDefi
Fabius DeFi
17 hours
➠ Everyone’s building agents, .but no one’s asking:. – Does this thing actually work?.– Is the dev team legit?.– Should I trust it with my time or tokens?. I think @InferiumAI is building something that’s massively underrated. A benchmark layer for the agent economy – the AI
Tweet media one
44
8
121
@vibequeen01
🦚Onobaby🦚
2 days
BitTorrent: Redefining the Future of Decentralized Infrastructure. As the world’s fastest decentralized, driverless system, BitTorrent is setting a new benchmark for Web3 speed, scalability, and innovation. Powered by BTFS, it’s revolutionizing data sharing and storage for the
Tweet media one
11
74
209
@KarlPertsch
Karl Pertsch
8 days
We’re organizing the RoboArena Challenge at CoRL this year!.Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖.Sign up, even if you don’t have a robot!. More details in 🧵👇
2
17
112
@fchollet
François Chollet
5 days
Today we're releasing a developer preview of our next-gen benchmark, ARC-AGI-3. The goal of this preview, leading up to the full version launch in early 2026, is to collaborate with the community. We invite you to provide feedback to help us build the most robust and effective
Tweet media one
209
979
3K
@theramjad
Ray Amjad
4 hours
Qwen3-Coder by @Alibaba_Qwen came out a few hours ago, and unfortunately, in a production codebase, underperforms when compared to Kimi K2 by @Kimi_Moonshot. That's despite performing better on the benchmarks. I think it's becoming increasingly clear models are "benchmark
2
0
2
@rmhrtgsrtc
Regional Manager/Hyderabad Region(City operations)
1 hour
🔥 A new benchmark in public service!.Telangana women have availed 200 crore zero-fare journeys under the transformative Mahalaxmi free travel scheme 🚌🌼.💷 ₹6680 crore saved.📆 As on 23.07.2025.A true leap for equality, access, and dignity. #TGSRTC #MahalakshmiScheme
Tweet media one
0
12
17
@Pirat_Nation
Pirat_Nation 🔴
8 days
Grok 4 and Grok 4 Heavy Benchmark Results
Tweet media one
10
8
116
@91mobiles
91mobiles
2 hours
We skipped the benchmarks and went straight to real gaming. COD Mobile, Genshin, BGMI: one phone came out on top🎮. Watch our Reno14 Pro vs iQOO Neo 10 vs vivo V50 gaming test!
1
1
13
@FT
Financial Times
@FT
30 minutes
Breaking news: Former UBS and Citigroup trader Tom Hayes has emerged victorious in his decade-long battle to clear his name for rigging Libor benchmark rates after the UK’s highest court quashed his conviction
Tweet media one
11
19
40
@shawshank_v
Shashank
2 days
Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research🧵
Tweet media one
11
49
251
@definikola
definikola
2 days
we're seeing the largest funding rate divergence from benchmark interest rates in the past year. > funding - 12%.> borrow - 5.5%.> supply - 3.8%. been up-only for over a week now
Tweet media one
2
1
11
@canmasu
Eric Chan 😻 ERC-520 Maneki-Meow
4 hours
What a magical number — 520 attendees!.#FOMO MY — AI. Art. Web3 — has set a new benchmark for the Malaysian Web3 community as the largest crowd to date. Huge thanks to our panelists, AI artists, guests, community partners, and of course, @Olofvw and @rene05. Together, we made
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
6
@MaxWinebach
Max Weinbach
6 days
Apple published benchmarks for their on-device model in OS 26 platforms vs. other SLMs. Seems to do VERY well!. The server model also seems to be competitive and very good!. Both server and on-device have great quantization optimizations, losing not much quality from 16 to 2 bit
Tweet media one
Tweet media two
Tweet media three
8
10
147