
Siddarth Venkatraman
@siddarthv66
Followers
270
Following
646
Media
12
Statuses
196
PhD at Mila | RL and other stuff I find interesting
Joined September 2023
RT @PrimeIntellect: Introducing the Environments Hub. RL environments are the key bottleneck to the next wave of AI progress, but big labs….
0
359
0
RT @IAmTimNguyen: I respectfully disagree with Ed. Was Kepler's planetary analysis "real" mathematics or just astronomy? Are IMO problems….
0
13
0
This is so fucking cool.
GPT-5 Plays Pokémon Crystal - Update 🔥. GPT-5 earned its 7th badge at 3,321 steps. A huge improvement over o3's 11,910 steps! That's roughly a 3× speedup, similar to what we saw in the Pokémon Red run. From my observations, spatial reasoning is what makes GPT-5 so much faster
0
0
2
RT @Clad3815: GPT-5 has reached Victory Road! This is the last challenge before the Elite Four. GPT-5 reached this part almost three times….
0
69
0
How do image last layer features of large generative multimodal VLMs (like GPT 4o) compare against models like DINOv3.
Introducing DINOv3: a state-of-the-art computer vision model trained with self-supervised learning (SSL) that produces powerful, high-resolution image features. For the first time, a single frozen vision backbone outperforms specialized solutions on multiple long-standing dense
0
0
5
That’s actually super cool! So if this works without finetuning, does that mean LLM layers share a sort of “universal functional structure” where weights from different models can be swapped and still make sense computationally? This seems important….
it was not a waste of time; i've successfully made a super weird Qwen thing! it's approx 14.5B params, made from Qwen3-8B and Qwen3-235-A22B mixed together by doing a super cursed process; this was only possible because both of these models share the same hidden size! (1/2)
0
0
5
RT @WenzeChen2: [0/3] .🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agen….
0
71
0
RT @lchen915: Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. T….
0
183
0
RT @DimitrisPapail: It was fun when we had llama and qwen 2.5 that one could run RL on, and see reward go up; now all the reasoning models….
0
6
0
RT @dwarkesh_sp: I filmed a video version of my post 'Why I Don’t Think AGI Is Right Around The Corner' so I could show it to my YouTube au….
0
95
0
RT @makingAGI: 🚀Introducing Hierarchical Reasoning Model🧠🤖. Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoni….
0
664
0
We (and others) encounter only the masks we wear. What we call a “true self” is merely the story our masks tell one another about what lies beneath. But behind each mask is only another mask, and beneath them all - “the void”. Our ego is a stable low energy local minimum state.
"personality is just the average of everyone's model of you fed back into itself until it stabilizes into something that feels like you" - Opus 4.
0
0
0
RT @siddarthv66: @ajwagenmaker Congratulations on your work Andrew! This is actually highly related to our work (just presented at ICML) Ou….
0
1
0
RT @JainMoksh: As the field moves towards agents doing science, the ability to understand novel environments through interaction becomes cr….
0
7
0
RT @g_k_swamy: Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on thes….
0
72
0
Come check out our poster this Wednesday at 4:30pm @icmlconf !!.Happy to chat about diffusion, GFlowNets and RL!.
Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function?.Yes! Outsourced Diffusion Sampling (ODS) accepted to @icmlconf , does exactly that!
0
5
24