Aarush Sah @aarush X Profile

Aarush Sah

@aarush

Followers

6K

Following

12K

Media

362

Statuses

3K

Head of Evals @GroqInc | Building openbench

https://t.co/vQSrADv7OU

SF

Joined September 2022

Don't wanna be here? Send us removal request.

Aarush Sah

@aarush

2 months

OPENBENCH 0.5.0 IS HERE It’s our biggest release yet - We added 350+ new evals, added ARC-AGI support, a plugin system for external benchmarks, provider routing, coding harnesses you can mix and match, tool‑calling evals, and more. Details in thread 🧵

6

16

93

Jason Zhang

@minisounds

2 hours

(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:

1

5

Range ETFs

@RangeETFs

1 month

After decades in the background, nuclear is back - and it's fueling the next wave of innovation and growth. Seek to capture the nuclear renaissance with $NUKZ.

0

2

18

Aarush Sah

@aarush

1 hour

Great blog post from my good friend @minisounds on learning intuition with AI. Would strongly recommend reading!

Jason Zhang

@minisounds

2 hours

(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:

0

1

Aarush Sah

@aarush

4 days

Tomorrow morning - openbench 0.5.3 :)

Toven

@pingToven

4 days

chat, help me convince @aarush to cut an openbench release after he's back from neurips pls

1

0

13

Aarush Sah

@aarush

4 days

Congrats @LandoNorris on the WDC 🎉 🎉🎉

Groq Inc

@GroqInc

4 days

Mega congrats to @LandoNorris, 2025 Drivers' World Champion! 🧡🏆 @McLarenF1

1

0

19

Nick Murphy, CFP®

@murphnc

1 day

A simple step saved our client $21k in taxes. A business owner's 2025 net income is nearly $0. We proactively created $90k of LT capital gains in his investment account. These cap gains are TAX FREE. In future years his tax rate on cap gains will be 23.8% Go on offense!

0

2

84

Aarush Sah

@aarush

9 days

I’m at the Groq booth at NeurIPS! Swing by and say hi to the team - we’re right by Google

5

3

80

Aarush Sah

@aarush

11 days

Another day of @GuillaumeLample being at NeurIPS 2024

0

10

Aarush Sah

@aarush

11 days

Three years ago today, our lives changed more than we could possibly imagine!

Sam Altman

@sama

3 years

today we launched ChatGPT. try talking with it here: https://t.co/uWra8LKFMN

1

0

10

Aarush Sah

@aarush

17 days

Someone's benchmarking GLM-4.6 through @OpenRouterAI with openbench 👀

0

2

22

Metafide AI

@metafide_ai

13 hours

In today’s daily game the top player hit a HUGE 55x!! Predictions are already open for the new daily game, so get in early while the field is still wide open. ALSO, the first Supercharge Session of the day begins in 20 minutes (9:30am ET). Then, for the second session (8:30pm

55

58

80

Aarush Sah

@aarush

18 days

Orange is the new black. @GroqInc 🤝 @McLarenF1

5

1

76

Shaunak Joshi

@shaunakjoshi

26 days

Want to influence AI development? Build evals, not models. How: • Find questions frontier models struggle with (<70% accuracy) • Test GPT-5, Claude, Qwen, DeepSeek, etc. • Open source the dataset • Write up your findings Labs actively track and optimize for public

openbench.dev

Provider-agnostic, open-source evaluation infrastructure for language models

1

7

Aarush Sah

@aarush

29 days

GPT-5.1 is in ChatGPT 👀

1

0

6

Aarush Sah

@aarush

1 month

I wonder how much economic value is lost due to the Caltrain having spotty WiFi

3

0

30

TK

@sumoru

2 days

Spekter Agency -- arguably the most technically advanced game on chat based apps -- has officially launched on Telegram and LINE! After an incredible Open Beta, @SpekterAgency is coming out of Pre-Season and entering Season 1. This marks the full launch of Spekter Agency along

8

22

80

Natasha Mayorga

@natashaamayorga

1 month

can we, as a collective, start prioritizing actually useful sample categorization in benchmark datasets (pretty please)? by no means a panacea, but would help move evals past superficial metrics and into deeply understanding model performance on underlying task axes.

0

1

Aarush Sah

@aarush

1 month

Programming without AI is a refreshing experience. Full control over every decision, every line of code - it's a nice change of pace from outsourcing implementation details to models

4

1

25

Aarush Sah

@aarush

1 month

Would you guys like us to?

Lisan al Gaib

@scaling01

1 month

I can't wait for Groq to host Kimi-K2 Thinking at 500 tokens/s

20

1

116

Aarush Sah

@aarush

1 month

The only reason I don't use ChatGPT for every question I have is latency. Google wins on speed - the moment ChatGPT is as fast as Google it's over

4

2

18

Know Trend LLC

@knowtrend_ai

3 days

Why did "Costco" show up so much in $CRM's latest earnings call?

1

3

14

Aarush Sah

@aarush

1 month

One of the sad (but amazing!) parts of working at a rapidly growing company is losing visibility into everything happening across the org. When I joined Groq, I knew every event, every project, every deal. The team was small, communication was effortless, and staying in sync

2

84

Groq Inc

@GroqInc

1 month

october was craaaazy: - Groq 🤝 IBM partnership - McLaren keeps winning with Groq on the halo - gpt-oss-safeguard day 0 on GroqCloud - gpt-oss 120/20B major price drop + prompt caching - Huge openbench v0.5.0 release with 350+ new evals - @JonathanRoss321 speaks at @FIIKSA - Groq

8

9

136

Aarush Sah

@aarush

1 month

It’s time to verify the unverifiable

4

0

19

Aarush Sah

@aarush

1 month

New handle what are we thinking guys

12

0

89

U.S. Global Investors

@USFunds

1 day

The U.S. has an estimated 48 million tons of identified copper in the ground. The challenge is bringing new mines online, which, in the U.S., take an average of 19 years, among the longest lead times in the world, according to @SPGlobal. That’s why new supply can’t ramp

5

16

77