Aarush Sah
@aarush
Followers
8K
Following
12K
Media
367
Statuses
3K
@NVIDIA | prev. Head of Evals @GroqInc | Creator of OpenBench
South Bay
Joined September 2022
I havenβt even been out and about and Iβm feeling sick - what is going around SF right now
1
1
19
Powered by @GroqInc π«‘
And finally, my favorite change. The default model is now Kimi K2. I love how it writes. I love talking to it. I've found it to be a significantly better chat model than anything else I've tried. It's so good that I'm scared it will hurt conversion to paid tiers.
0
0
36
Anecdotally, GPT-5.2 seems to hallucinate a lot more than GPT-5.1 and GPT-5 when used as a chat model. Has anyone else noticed this?
1
0
9
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
1
5
21
Great blog post from my good friend @minisounds on learning intuition with AI. Would strongly recommend reading!
(1/3) developing good intuition and "feel" for concepts in ai (architectures, theory, etc) is crucial in order to be productive, but not many talk about how to build it. wrote a quick read on my <30 min process for building robust intuition, quickly:
1
0
6
Tomorrow morning - openbench 0.5.3 :)
chat, help me convince @aarush to cut an openbench release after he's back from neurips pls
1
0
15
Congrats @LandoNorris on the WDC π ππ
1
0
24
Iβm at the Groq booth at NeurIPS! Swing by and say hi to the team - weβre right by Google
6
3
87
Three years ago today, our lives changed more than we could possibly imagine!
today we launched ChatGPT. try talking with it here: https://t.co/uWra8LKFMN
1
0
11
Want to influence AI development? Build evals, not models. How: β’ Find questions frontier models struggle with (<70% accuracy) β’ Test GPT-5, Claude, Qwen, DeepSeek, etc. β’ Open source the dataset β’ Write up your findings Labs actively track and optimize for public
openbench.dev
Provider-agnostic, open-source evaluation infrastructure for language models
1
1
7
I wonder how much economic value is lost due to the Caltrain having spotty WiFi
3
0
31