aarush Profile Banner
Aarush Sah Profile
Aarush Sah

@aarush

Followers
6K
Following
12K
Media
362
Statuses
3K

Head of Evals @GroqInc | Building openbench

SF
Joined September 2022
Don't wanna be here? Send us removal request.
@aarush
Aarush Sah
2 months
OPENBENCH 0.5.0 IS HERE It’s our biggest release yet - We added 350+ new evals, added ARC-AGI support, a plugin system for external benchmarks, provider routing, coding harnesses you can mix and match, tool‑calling evals, and more. Details in thread 🧵
6
16
93
@minisounds
Jason Zhang
2 hours
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
1
1
5
@RangeETFs
Range ETFs
1 month
After decades in the background, nuclear is back - and it's fueling the next wave of innovation and growth. Seek to capture the nuclear renaissance with $NUKZ.
0
2
18
@aarush
Aarush Sah
1 hour
Great blog post from my good friend @minisounds on learning intuition with AI. Would strongly recommend reading!
@minisounds
Jason Zhang
2 hours
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
0
0
1
@aarush
Aarush Sah
4 days
Tomorrow morning - openbench 0.5.3 :)
@pingToven
Toven
4 days
chat, help me convince @aarush to cut an openbench release after he's back from neurips pls
1
0
13
@aarush
Aarush Sah
4 days
Congrats @LandoNorris on the WDC 🎉 🎉🎉
@GroqInc
Groq Inc
4 days
Mega congrats to @LandoNorris, 2025 Drivers' World Champion! 🧡🏆 @McLarenF1
1
0
19
@murphnc
Nick Murphy, CFP®
1 day
A simple step saved our client $21k in taxes. A business owner's 2025 net income is nearly $0. We proactively created $90k of LT capital gains in his investment account. These cap gains are TAX FREE. In future years his tax rate on cap gains will be 23.8% Go on offense!
0
2
84
@aarush
Aarush Sah
9 days
I’m at the Groq booth at NeurIPS! Swing by and say hi to the team - we’re right by Google
5
3
80
@aarush
Aarush Sah
11 days
Another day of @GuillaumeLample being at NeurIPS 2024
0
0
10
@aarush
Aarush Sah
11 days
Three years ago today, our lives changed more than we could possibly imagine!
@sama
Sam Altman
3 years
today we launched ChatGPT. try talking with it here: https://t.co/uWra8LKFMN
1
0
10
@aarush
Aarush Sah
17 days
Someone's benchmarking GLM-4.6 through @OpenRouterAI with openbench 👀
0
2
22
@metafide_ai
Metafide AI
13 hours
In today’s daily game the top player hit a HUGE 55x!! Predictions are already open for the new daily game, so get in early while the field is still wide open. ALSO, the first Supercharge Session of the day begins in 20 minutes (9:30am ET). Then, for the second session (8:30pm
55
58
80
@aarush
Aarush Sah
18 days
Orange is the new black. @GroqInc 🤝 @McLarenF1
5
1
76
@shaunakjoshi
Shaunak Joshi
26 days
Want to influence AI development? Build evals, not models. How: • Find questions frontier models struggle with (<70% accuracy) • Test GPT-5, Claude, Qwen, DeepSeek, etc. • Open source the dataset • Write up your findings Labs actively track and optimize for public
Tweet card summary image
openbench.dev
Provider-agnostic, open-source evaluation infrastructure for language models
1
1
7
@aarush
Aarush Sah
29 days
GPT-5.1 is in ChatGPT 👀
1
0
6
@aarush
Aarush Sah
1 month
I wonder how much economic value is lost due to the Caltrain having spotty WiFi
3
0
30
@sumoru
TK
2 days
Spekter Agency -- arguably the most technically advanced game on chat based apps -- has officially launched on Telegram and LINE! After an incredible Open Beta, @SpekterAgency is coming out of Pre-Season and entering Season 1. This marks the full launch of Spekter Agency along
8
22
80
@natashaamayorga
Natasha Mayorga
1 month
can we, as a collective, start prioritizing actually useful sample categorization in benchmark datasets (pretty please)? by no means a panacea, but would help move evals past superficial metrics and into deeply understanding model performance on underlying task axes.
0
1
1
@aarush
Aarush Sah
1 month
Programming without AI is a refreshing experience. Full control over every decision, every line of code - it's a nice change of pace from outsourcing implementation details to models
4
1
25
@aarush
Aarush Sah
1 month
Would you guys like us to?
@scaling01
Lisan al Gaib
1 month
I can't wait for Groq to host Kimi-K2 Thinking at 500 tokens/s
20
1
116
@aarush
Aarush Sah
1 month
The only reason I don't use ChatGPT for every question I have is latency. Google wins on speed - the moment ChatGPT is as fast as Google it's over
4
2
18
@knowtrend_ai
Know Trend LLC
3 days
Why did "Costco" show up so much in $CRM's latest earnings call?
1
3
14
@aarush
Aarush Sah
1 month
One of the sad (but amazing!) parts of working at a rapidly growing company is losing visibility into everything happening across the org. When I joined Groq, I knew every event, every project, every deal. The team was small, communication was effortless, and staying in sync
2
2
84
@GroqInc
Groq Inc
1 month
october was craaaazy: - Groq 🤝 IBM partnership - McLaren keeps winning with Groq on the halo - gpt-oss-safeguard day 0 on GroqCloud - gpt-oss 120/20B major price drop + prompt caching - Huge openbench v0.5.0 release with 350+ new evals - @JonathanRoss321 speaks at @FIIKSA - Groq
8
9
136
@aarush
Aarush Sah
1 month
It’s time to verify the unverifiable
4
0
19
@aarush
Aarush Sah
1 month
New handle what are we thinking guys
12
0
89
@USFunds
U.S. Global Investors
1 day
The U.S. has an estimated 48 million tons of identified copper in the ground. The challenge is bringing new mines online, which, in the U.S., take an average of 19 years, among the longest lead times in the world, according to @SPGlobal. That’s why new supply can’t ramp
5
16
77