Vivek Verma
@vcubingx
Followers
13K
Following
3K
Media
71
Statuses
310
math youtuber & researcher @openai
Joined January 2017
New video! The attention mechanism is well known for its use in Transformers. But where does it come from? It's origins lie in fixing a strange problems of RNNs. Watch the video to learn about it! https://t.co/W6nqLH859P
4
67
527
@nelvOfficial ignore the gpt-5 name, o1/o3 were undeniably gpt-5 level and it just took us time to have confidence to bump the name.
12
7
210
Benchmarking model intelligence, particularly their ability to generalize robustly across diverse stateful and long-horizon tasks, was the focus of our new paper: Measuring General Intelligence with Generated Games.
2
1
6
Unfortunately this means that the videos will have to take a back seat in the meanwhile. Math content creation is still a huge passion of mine, so I’ll be trying my best to push out content as time permits 🫡
2
0
45
Life update: I’ve taken up a job as a researcher on the post-training team at @openai, working on reinforcement learning, function calling and other efforts! I’ve also graduated from @UCBerkeley where I’m grateful for a wonderful four years of learning and fun 😊
52
17
2K
Spritz Vibe. Limited Edition. Frosted over & fresh for the season, Spritz Vibe Sparkling Snowball Frost Limited Edition is here! CELSIUS. LIVE. FIT. GO.
533
729
11K
Generation of verifiable gyms is here. New hill to climb.
[LG] Measuring General Intelligence with Generated Games V Verma, D Huang, W Chen, D Klein... [UC Berkeley] (2025) https://t.co/GN7LZWj2LD
4
8
119
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd
4
30
182
This is joint work with @davidhuang33176, William Chen, Dan Klein and @NickATomlin, who have been fantastic collaborators in this project. Please do support and follow them!
0
0
4
We release the generated games, data generation process, and evaluation code in order to support future modeling work and expansion of our benchmark. 💻Check it out and give it a ⭐️at https://t.co/XB7lCsQrAD!
github.com
Measuring General Intelligence With Generated Games (Preprint) - vivek3141/gg-bench
1
0
7
gg-bench is challenging: state-of-the-art LLMs such as GPT-4o and Claude 3.7 Sonnet achieve winrates of 7-9% on gg-bench using in-context learning, while reasoning models such as o1, o3-mini and DeepSeek-R1 achieve average winrates of 31-36%.
1
0
4
gg-bench is created by (1) generating natural language descriptions of novel games (2) generating implementations of each game in code as a Gym environment and (3) training RL agents via self-play on the generated games. We measure the average winrate across all generated games.
1
0
5
We believe the future of benchmarks are not static lists of questions but data generating processes, such that individual task instances can be regenerated at will. As such, (1) contaminated data points can be regenerated (2) tasks get difficult as LLMs get better.
1
0
5
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd
3
25
147
I just put up a new video, which was a collaboration with Terence Tao about the cosmic distance ladder. You can find the full video on YouTube, and here's a bit of extra footage that didn't make it into the final.
91
610
6K
This is honestly so incredibly tragic. I didn't know Suchir personally - but he was someone I looked up to a lot and viewed as a role model. I'm extremely sad to hear this news, I can't imagine the pains his family is going through. Rest in peace 🙏
OpenAI whistleblower Suchir Balaji, who accused the company of breaking copyright law, found dead in apparent suicide
5
3
40
Likewise, if I'm trying to model language, I want to structure my model in a way to satisfy it's properties. That way, it's able to generalize beyond the training data better. Do check out the three-part language modeling series!
1
1
10
Meet the Chair Blanket – the ultimate outdoor essential that transforms any seat into a cozy retreat. 🌲 Waterproof on one side, plush Sherpa on the other, and packs up into a portable carry pouch! Perfect for fall bonfires, camping, and game days. 🏕️🏈 https://t.co/0f90NPPENk
0
29
457
Sometimes, getting data is really hard. But, if I knew beforehand that I was dealing with a pendulum, then I'd probably choose from a set of periodic functions when modeling it's position.
1
0
4
But, if I add more points, it's pretty apparent I'm trying to model some periodic function.
1
0
2
If I have a couple data points telling me the pendulum's x position over time, then there are plenty of functions that "look" like they fit the data. These functions perfectly model the data I have, but are way off for points in-between.
1
0
1
A small tidbit I cut out of my recent series on language modeling on why we need "simpler" models. Let's say I'm trying to model the behavior of a pendulum, which for small angles, looks like a sine wave.
1
1
34
Well, waddya know -- looks like a NEW VIDEO! https://t.co/sAgMVYwTTj
A few years ago, I learned a theorem called "Riemann's Existence Theorem". It literally took my breath away. It was so shocking and unexpected -- drawing a bridge between two distant continents of math. I knew in that moment that I had to make a video about it. But as I
3
41
331