Jasper
@zjasper
Followers
15K
Following
7K
Media
421
Statuses
5K
Co-founder and CEO @Hyperbolic_Labs. ex-@avax & ex-@citsecurities. Finished Math PhD in 2yrs @UCBerkeley. Math Olympiad Gold Medalist. Highest honor @PKU1898
California, USA
Joined November 2018
AI is great at hitting explicit goals, but often at the cost of the hidden ones. Terence Tao just wrote about this. He points out: AI is the ultimate executor of Goodhart’s law, i.e. when a measure becomes the target, it stops measuring what we care about. Take a call center.
65
120
954
Check out the latest report that we have about math evaluations 😉
If you care about systematic professional evaluation of mathematical capabilities of AI models, you can always find the latest at the following sites, managed by the GAUSS team: Blog: https://t.co/6hG5sKIZ8j Full report: https://t.co/Ff2WyIsKaQ Github:
2
0
8
Evaluation of mathematical capabilities by the latest LLM models, and comparison with that by humans.
[2/n] Benchmarking results: DeepSeek-Math-V2 wins on accuracy and mean absolute error (MAE); GPT-5 wins on pearson correlation; Gemini-3-Pro is within top-3 on three metrics. See more in: Blog: https://t.co/1KXUxRUpex Full report: https://t.co/NvktvxMI87 Github:
1
6
23
Now you can set up organizations for your team on @hyperbolic_labs!
Hyperbolic Organizations are now live. 👇🏻 A unified, secure way for teams to build AI together without shared credentials, scattered billing, or unclear usage. Organizations centralize access, governance, and spend across all AI workflows.
1
0
5
[6/n] Work done with great team @tianzhec @jiaxin
@liao_zhen53785, Qiuyu Ren, Tahsin Saffat, @ZitongYang0 and @YiMaTweets
@hyperbolic_labs's H200 GPU node makes things happen: fast & bug-free during deploying DeepSeek-Math-V2.
1
0
6
[5/n] Finding 3: LLM grades more diversely than human DeepSeek-Math-V2 aligns exceptionally well with human (both of them gives a lot of 0 grades 🤣) on the metric but all other models grade more diversely with 1.6–2.0 Entropy Ratio and 1.3-1.9 Relative Variance.
1
0
5
[4/n] Finding 2: Grading precision correlates with problem Overall good (GPT-5), medium (DeepSeek-Chat-V3.1), and bad (Qwen3-235B-A22B-Thinking) models have a similar precision trend: all of them are relatively good on P2, P3, and P5 but bad on P4.
1
0
3
[3/n] Finding 1: Most LLMs are lenient but DeepSeek-Math-V2 is strict LLMs (on average) grade less accurate (higher MAE scores) on subset with no valid reasoning answers. 3 example models tend to give higher scores than human according to the confusion matrices.
1
0
4
[2/n] Benchmarking results: DeepSeek-Math-V2 wins on accuracy and mean absolute error (MAE); GPT-5 wins on pearson correlation; Gemini-3-Pro is within top-3 on three metrics. See more in: Blog: https://t.co/1KXUxRUpex Full report: https://t.co/NvktvxMI87 Github:
1
0
7
Is @deepseek_ai the new king of Math grading? By benchmarking LLM-as-a-judge on USAMO 2025, we find that: 🟠 DeepSeek-Math V2 achieves the highest accuracy and most closely aligns with human graders when the submitted answer shows no meaningful progress. 🔵 Gemini-3-Pro
5
6
105
Congrats on the launch and excited to support!
Introducing Lux, the most powerful and fastest Computer Use model, built by OpenAGI Foundation @agiopen_org Lux outperforms Google Gemini CUA, OpenAI Operator and Anthropic Claude on benchmark with 300 real-world tasks. Try our developer-friendly SDK to build powerful,
1
0
7
@hyperbolic_labs Here are deepseek-math-v2's solutions to CMO 2025. h/t @TianzheC
https://t.co/ZkESnPjpN3
github.com
Contribute to Gauss-Math/DeepSeek-math-v2-results development by creating an account on GitHub.
1
0
0
We got deepseek-math-v2 running on 8xH200 node on @hyperbolic_labs on-demand GPU platform. Feel free to reply with any math problems that you want to know and I can share the answers. An exciting time to own the brain of one of the best mathematicians!
As far as I know, there isn't any chatbot or API that gives you access to an IMO 2025 gold-medalist model. Not only does this change today, but you get to download the weights with the Apache 2.0 open-source release of @deepseek_ai Math-V2 on @huggingface! Imagine owning the
30
16
240
DeepSeek dropped their latest AI research again on a holiday. DeepSeek-Math-V2 is the first open AI model that can win gold at IMO 2025 and beat Gemini on IMO-ProofBench. They’re using a generator–verifier architecture that feels like GAN in the early days. - first train a
We just shared some thoughts and results on self-verifiable mathematical reasoning. The released model, DeepSeekMath-V2, is strong on IMO-ProofBench and competitions like IMO 2025 (5/6 problems) and Putnam 2024 (a near-perfect score of 118/120). Github: https://t.co/4dMEqWxXfU
8
19
185
Excited to be a launch partner for @nvidia Nemotron Nano 2 VL! Looking forward to seeing the creative use cases built on top of this model 🔥
Excited to announce NVIDIA’s Nemotron Models (@nvidia) on Hyperbolic! A powerful new family of open models, datasets, and techniques designed to help teams build high-accuracy, specialized agentic AI.
1
0
15
Ready to Compete for Compute? ♠️💻 Join us Oct 28th in SF for Poker Night [Compute Edition], hosted by @hyperbolic_ai × @join_ef × @_ai_collective. No buy-ins. No stakes. Just (free) compute and fun! 🎟 Apply to join → https://t.co/lPSjBmGCyV
3
3
18
Only in SF: Had breakfast with @matistanis, cofounder & CEO of ElevenLabs and learned how they scaled the team to 300+. He shared his weekly breakdown: • 25% hiring • 25–50% sales (and generate product insights) • 25% misc And every Saturday, he personally tests new product
6
6
125
🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple
3
29
138
Andrej Karpathy released nanochat, ~8K lines of minimal code that do pretrain + midtrain + SFT + RL + inference + ChatGPT-like webUI. It trains a 560M LLM in ~4 hrs on 8×H100. I trained and hosted it on Hyperbolic GPUs ($48). First prompt reminded me how funny tiny LLMs are.
66
139
3K