Vivek Verma Profile
Vivek Verma

@vcubingx

Followers
13K
Following
3K
Media
71
Statuses
310

math youtuber & researcher @openai

Joined January 2017
Don't wanna be here? Send us removal request.
@vcubingx
Vivek Verma
2 years
New video! The attention mechanism is well known for its use in Transformers. But where does it come from? It's origins lie in fixing a strange problems of RNNs. Watch the video to learn about it! https://t.co/W6nqLH859P
4
67
527
@willdepue
will depue
3 months
@nelvOfficial ignore the gpt-5 name, o1/o3 were undeniably gpt-5 level and it just took us time to have confidence to bump the name.
12
7
210
@vast_ai
vast.ai
1 month
Stop waiting for GPU access. Start training.
0
12
58
@davidhuang33176
David Huang
3 months
Benchmarking model intelligence, particularly their ability to generalize robustly across diverse stateful and long-horizon tasks, was the focus of our new paper: Measuring General Intelligence with Generated Games.
2
1
6
@vcubingx
Vivek Verma
5 months
Unfortunately this means that the videos will have to take a back seat in the meanwhile. Math content creation is still a huge passion of mine, so I’ll be trying my best to push out content as time permits 🫡
2
0
45
@vcubingx
Vivek Verma
5 months
Life update: I’ve taken up a job as a researcher on the post-training team at @openai, working on reinforcement learning, function calling and other efforts! I’ve also graduated from @UCBerkeley where I’m grateful for a wonderful four years of learning and fun 😊
52
17
2K
@CelsiusOfficial
CELSIUS Energy Drink
2 months
Spritz Vibe. Limited Edition. Frosted over & fresh for the season, Spritz Vibe Sparkling Snowball Frost Limited Edition is here! CELSIUS. LIVE. FIT. GO.
533
729
11K
@teortaxesTex
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
6 months
Generation of verifiable gyms is here. New hill to climb.
@fly51fly
fly51fly
6 months
[LG] Measuring General Intelligence with Generated Games V Verma, D Huang, W Chen, D Klein... [UC Berkeley] (2025) https://t.co/GN7LZWj2LD
4
8
119
@NickATomlin
Nicholas Tomlin
6 months
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
@vcubingx
Vivek Verma
6 months
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd
4
30
182
@vcubingx
Vivek Verma
6 months
This is joint work with @davidhuang33176, William Chen, Dan Klein and @NickATomlin, who have been fantastic collaborators in this project. Please do support and follow them!
0
0
4
@vcubingx
Vivek Verma
6 months
We release the generated games, data generation process, and evaluation code in order to support future modeling work and expansion of our benchmark. 💻Check it out and give it a ⭐️at https://t.co/XB7lCsQrAD!
Tweet card summary image
github.com
Measuring General Intelligence With Generated Games (Preprint) - vivek3141/gg-bench
1
0
7
@vcubingx
Vivek Verma
6 months
gg-bench is challenging: state-of-the-art LLMs such as GPT-4o and Claude 3.7 Sonnet achieve winrates of 7-9% on gg-bench using in-context learning, while reasoning models such as o1, o3-mini and DeepSeek-R1 achieve average winrates of 31-36%.
1
0
4
@vcubingx
Vivek Verma
6 months
gg-bench is created by (1) generating natural language descriptions of novel games (2) generating implementations of each game in code as a Gym environment and (3) training RL agents via self-play on the generated games. We measure the average winrate across all generated games.
1
0
5
@vcubingx
Vivek Verma
6 months
We believe the future of benchmarks are not static lists of questions but data generating processes, such that individual task instances can be regenerated at will. As such, (1) contaminated data points can be regenerated (2) tasks get difficult as LLMs get better.
1
0
5
@vcubingx
Vivek Verma
6 months
🎮 Excited to announce gg-bench, a fully synthetic benchmark for LLMs consisting of games generated entirely by LLMs!! This benchmark centers around the fact that LLMs are capable of generating complex tasks that they themselves cannot even solve. 📄: https://t.co/kddoCgDkvd
3
25
147
@3blue1brown
Grant Sanderson
9 months
I just put up a new video, which was a collaboration with Terence Tao about the cosmic distance ladder. You can find the full video on YouTube, and here's a bit of extra footage that didn't make it into the final.
91
610
6K
@vcubingx
Vivek Verma
11 months
This is honestly so incredibly tragic. I didn't know Suchir personally - but he was someone I looked up to a lot and viewed as a role model. I'm extremely sad to hear this news, I can't imagine the pains his family is going through. Rest in peace 🙏
@BNONews
BNO News
11 months
OpenAI whistleblower Suchir Balaji, who accused the company of breaking copyright law, found dead in apparent suicide
5
3
40
@vcubingx
Vivek Verma
1 year
Likewise, if I'm trying to model language, I want to structure my model in a way to satisfy it's properties. That way, it's able to generalize beyond the training data better. Do check out the three-part language modeling series!
1
1
10
@CityBonfires
City Bonfires
1 year
Meet the Chair Blanket – the ultimate outdoor essential that transforms any seat into a cozy retreat. 🌲 Waterproof on one side, plush Sherpa on the other, and packs up into a portable carry pouch! Perfect for fall bonfires, camping, and game days. 🏕️🏈 https://t.co/0f90NPPENk
0
29
457
@vcubingx
Vivek Verma
1 year
Sometimes, getting data is really hard. But, if I knew beforehand that I was dealing with a pendulum, then I'd probably choose from a set of periodic functions when modeling it's position.
1
0
4
@vcubingx
Vivek Verma
1 year
But, if I add more points, it's pretty apparent I'm trying to model some periodic function.
1
0
2
@vcubingx
Vivek Verma
1 year
If I have a couple data points telling me the pendulum's x position over time, then there are plenty of functions that "look" like they fit the data. These functions perfectly model the data I have, but are way off for points in-between.
1
0
1
@vcubingx
Vivek Verma
1 year
A small tidbit I cut out of my recent series on language modeling on why we need "simpler" models. Let's say I'm trying to model the behavior of a pendulum, which for small angles, looks like a sine wave.
1
1
34
@00aleph00
adithya
2 years
Well, waddya know -- looks like a NEW VIDEO! https://t.co/sAgMVYwTTj
@00aleph00
adithya
2 years
A few years ago, I learned a theorem called "Riemann's Existence Theorem". It literally took my breath away. It was so shocking and unexpected -- drawing a bridge between two distant continents of math. I knew in that moment that I had to make a video about it. But as I
3
41
331