Aarush Sah
@aarush
Followers
6K
Following
12K
Media
362
Statuses
3K
Head of Evals @GroqInc | Building openbench
SF
Joined September 2022
OPENBENCH 0.5.0 IS HERE It’s our biggest release yet - We added 350+ new evals, added ARC-AGI support, a plugin system for external benchmarks, provider routing, coding harnesses you can mix and match, tool‑calling evals, and more. Details in thread 🧵
6
16
93
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
1
1
5
After decades in the background, nuclear is back - and it's fueling the next wave of innovation and growth. Seek to capture the nuclear renaissance with $NUKZ.
0
2
18
Great blog post from my good friend @minisounds on learning intuition with AI. Would strongly recommend reading!
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
0
0
1
Tomorrow morning - openbench 0.5.3 :)
chat, help me convince @aarush to cut an openbench release after he's back from neurips pls
1
0
13
Congrats @LandoNorris on the WDC 🎉 🎉🎉
1
0
19
A simple step saved our client $21k in taxes. A business owner's 2025 net income is nearly $0. We proactively created $90k of LT capital gains in his investment account. These cap gains are TAX FREE. In future years his tax rate on cap gains will be 23.8% Go on offense!
0
2
84
I’m at the Groq booth at NeurIPS! Swing by and say hi to the team - we’re right by Google
5
3
80
Three years ago today, our lives changed more than we could possibly imagine!
today we launched ChatGPT. try talking with it here: https://t.co/uWra8LKFMN
1
0
10
In today’s daily game the top player hit a HUGE 55x!! Predictions are already open for the new daily game, so get in early while the field is still wide open. ALSO, the first Supercharge Session of the day begins in 20 minutes (9:30am ET). Then, for the second session (8:30pm
55
58
80
Want to influence AI development? Build evals, not models. How: • Find questions frontier models struggle with (<70% accuracy) • Test GPT-5, Claude, Qwen, DeepSeek, etc. • Open source the dataset • Write up your findings Labs actively track and optimize for public
openbench.dev
Provider-agnostic, open-source evaluation infrastructure for language models
1
1
7
I wonder how much economic value is lost due to the Caltrain having spotty WiFi
3
0
30
Spekter Agency -- arguably the most technically advanced game on chat based apps -- has officially launched on Telegram and LINE! After an incredible Open Beta, @SpekterAgency is coming out of Pre-Season and entering Season 1. This marks the full launch of Spekter Agency along
8
22
80
can we, as a collective, start prioritizing actually useful sample categorization in benchmark datasets (pretty please)? by no means a panacea, but would help move evals past superficial metrics and into deeply understanding model performance on underlying task axes.
0
1
1
Programming without AI is a refreshing experience. Full control over every decision, every line of code - it's a nice change of pace from outsourcing implementation details to models
4
1
25
Would you guys like us to?
20
1
116
The only reason I don't use ChatGPT for every question I have is latency. Google wins on speed - the moment ChatGPT is as fast as Google it's over
4
2
18
Why did "Costco" show up so much in $CRM's latest earnings call?
1
3
14
One of the sad (but amazing!) parts of working at a rapidly growing company is losing visibility into everything happening across the org. When I joined Groq, I knew every event, every project, every deal. The team was small, communication was effortless, and staying in sync
2
2
84
october was craaaazy: - Groq 🤝 IBM partnership - McLaren keeps winning with Groq on the halo - gpt-oss-safeguard day 0 on GroqCloud - gpt-oss 120/20B major price drop + prompt caching - Huge openbench v0.5.0 release with 350+ new evals - @JonathanRoss321 speaks at @FIIKSA - Groq
8
9
136
The U.S. has an estimated 48 million tons of identified copper in the ground. The challenge is bringing new mines online, which, in the U.S., take an average of 19 years, among the longest lead times in the world, according to @SPGlobal. That’s why new supply can’t ramp
5
16
77