zjasper Profile Banner
Jasper Profile
Jasper

@zjasper

Followers
15K
Following
7K
Media
421
Statuses
5K

Co-founder and CEO @Hyperbolic_Labs. ex-@avax & ex-@citsecurities. Finished Math PhD in 2yrs @UCBerkeley. Math Olympiad Gold Medalist. Highest honor @PKU1898

California, USA
Joined November 2018
Don't wanna be here? Send us removal request.
@zjasper
Jasper
3 months
AI is great at hitting explicit goals, but often at the cost of the hidden ones. Terence Tao just wrote about this. He points out: AI is the ultimate executor of Goodhart’s law, i.e. when a measure becomes the target, it stops measuring what we care about. Take a call center.
65
120
954
@zjasper
Jasper
3 days
Check out the latest report that we have about math evaluations 😉
@YiMaTweets
Yi Ma
3 days
If you care about systematic professional evaluation of mathematical capabilities of AI models, you can always find the latest at the following sites, managed by the GAUSS team: Blog: https://t.co/6hG5sKIZ8j Full report: https://t.co/Ff2WyIsKaQ Github:
2
0
8
@YiMaTweets
Yi Ma
3 days
Evaluation of mathematical capabilities by the latest LLM models, and comparison with that by humans.
@zjasper
Jasper
4 days
[2/n] Benchmarking results: DeepSeek-Math-V2 wins on accuracy and mean absolute error (MAE); GPT-5 wins on pearson correlation; Gemini-3-Pro is within top-3 on three metrics. See more in: Blog: https://t.co/1KXUxRUpex Full report: https://t.co/NvktvxMI87 Github:
1
6
23
@zjasper
Jasper
3 days
Now you can set up organizations for your team on @hyperbolic_labs!
@hyperbolic_labs
Hyperbolic
3 days
Hyperbolic Organizations are now live. 👇🏻 A unified, secure way for teams to build AI together without shared credentials, scattered billing, or unclear usage. Organizations centralize access, governance, and spend across all AI workflows.
1
0
5
@zjasper
Jasper
4 days
[6/n] Work done with great team @tianzhec @jiaxin @liao_zhen53785, Qiuyu Ren, Tahsin Saffat, @ZitongYang0 and @YiMaTweets @hyperbolic_labs's H200 GPU node makes things happen: fast & bug-free during deploying DeepSeek-Math-V2.
1
0
6
@zjasper
Jasper
4 days
[5/n] Finding 3: LLM grades more diversely than human DeepSeek-Math-V2 aligns exceptionally well with human (both of them gives a lot of 0 grades 🤣) on the metric but all other models grade more diversely with 1.6–2.0 Entropy Ratio and 1.3-1.9 Relative Variance.
1
0
5
@zjasper
Jasper
4 days
[4/n] Finding 2: Grading precision correlates with problem Overall good (GPT-5), medium (DeepSeek-Chat-V3.1), and bad (Qwen3-235B-A22B-Thinking) models have a similar precision trend: all of them are relatively good on P2, P3, and P5 but bad on P4.
1
0
3
@zjasper
Jasper
4 days
[3/n] Finding 1: Most LLMs are lenient but DeepSeek-Math-V2 is strict LLMs (on average) grade less accurate (higher MAE scores) on subset with no valid reasoning answers. 3 example models tend to give higher scores than human according to the confusion matrices.
1
0
4
@zjasper
Jasper
4 days
[2/n] Benchmarking results: DeepSeek-Math-V2 wins on accuracy and mean absolute error (MAE); GPT-5 wins on pearson correlation; Gemini-3-Pro is within top-3 on three metrics. See more in: Blog: https://t.co/1KXUxRUpex Full report: https://t.co/NvktvxMI87 Github:
1
0
7
@zjasper
Jasper
4 days
Is @deepseek_ai the new king of Math grading? By benchmarking LLM-as-a-judge on USAMO 2025, we find that: 🟠 DeepSeek-Math V2 achieves the highest accuracy and most closely aligns with human graders when the submitted answer shows no meaningful progress. 🔵 Gemini-3-Pro
5
6
105
@zjasper
Jasper
4 days
Congrats on the launch and excited to support!
@qinzytech
Zengyi Qin
5 days
Introducing Lux, the most powerful and fastest Computer Use model, built by OpenAGI Foundation @agiopen_org Lux outperforms Google Gemini CUA, OpenAI Operator and Anthropic Claude on benchmark with 300 real-world tasks. Try our developer-friendly SDK to build powerful,
1
0
7
@zjasper
Jasper
8 days
We got deepseek-math-v2 running on 8xH200 node on @hyperbolic_labs on-demand GPU platform. Feel free to reply with any math problems that you want to know and I can share the answers. An exciting time to own the brain of one of the best mathematicians!
@ClementDelangue
clem 🤗
9 days
As far as I know, there isn't any chatbot or API that gives you access to an IMO 2025 gold-medalist model. Not only does this change today, but you get to download the weights with the Apache 2.0 open-source release of @deepseek_ai Math-V2 on @huggingface! Imagine owning the
30
16
240
@zjasper
Jasper
9 days
DeepSeek dropped their latest AI research again on a holiday. DeepSeek-Math-V2 is the first open AI model that can win gold at IMO 2025 and beat Gemini on IMO-ProofBench. They’re using a generator–verifier architecture that feels like GAN in the early days. - first train a
@zhs05232838
Zhihong Shao
9 days
We just shared some thoughts and results on self-verifiable mathematical reasoning. The released model, DeepSeekMath-V2, is strong on IMO-ProofBench and competitions like IMO 2025 (5/6 problems) and Putnam 2024 (a near-perfect score of 118/120). Github: https://t.co/4dMEqWxXfU
8
19
185
@hyperbolic_labs
Hyperbolic
19 days
⚡ Flash Sale (Until we sell out again)
2
2
12
@zjasper
Jasper
1 month
Excited to be a launch partner for @nvidia Nemotron Nano 2 VL! Looking forward to seeing the creative use cases built on top of this model 🔥
@hyperbolic_labs
Hyperbolic
1 month
Excited to announce NVIDIA’s Nemotron Models (@nvidia) on Hyperbolic! A powerful new family of open models, datasets, and techniques designed to help teams build high-accuracy, specialized agentic AI.
1
0
15
@zjasper
Jasper
1 month
Grab these H100/H200 while they are still available!
@hyperbolic_labs
Hyperbolic
1 month
We've been at capacity the past couple days, but just launched more supply in our on-demand: > H100s SXM @ $1.49/hr > H200s SXM @ $2.00/hr
1
0
8
@hyperbolic_labs
Hyperbolic
2 months
Ready to Compete for Compute? ♠️💻 Join us Oct 28th in SF for Poker Night [Compute Edition], hosted by @hyperbolic_ai × @join_ef × @_ai_collective. No buy-ins. No stakes. Just (free) compute and fun! 🎟 Apply to join → https://t.co/lPSjBmGCyV
3
3
18
@zjasper
Jasper
2 months
Only in SF: Had breakfast with @matistanis, cofounder & CEO of ElevenLabs and learned how they scaled the team to 300+. He shared his weekly breakdown: • 25% hiring • 25–50% sales (and generate product insights) • 25% misc And every Saturday, he personally tests new product
6
6
125
@rllm_project
rLLM
2 months
🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple
3
29
138
@Yuchenj_UW
Yuchen Jin
2 months
Andrej Karpathy released nanochat, ~8K lines of minimal code that do pretrain + midtrain + SFT + RL + inference + ChatGPT-like webUI. It trains a 560M LLM in ~4 hrs on 8×H100. I trained and hosted it on Hyperbolic GPUs ($48). First prompt reminded me how funny tiny LLMs are.
66
139
3K