Rahul Venkatesh Profile
Rahul Venkatesh

@Rahul_Venkatesh

Followers
192
Following
73
Media
23
Statuses
67

CS Ph.D. student at Stanford @NeuroAILab @StanfordAILab

Joined November 2009
Don't wanna be here? Send us removal request.
@Rahul_Venkatesh
Rahul Venkatesh
5 months
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical
6
15
55
@dyamins
Daniel Yamins
2 months
A couple of months ago @KordingLab trolled (in the best possible sense of that term) my happy-go-lucky @StanfordBrain podcast on Brain simulations. Actually he had some interesting points.... We decided to have an in-depth "podcast" about it, for your listening pleasure:
3
15
94
@XihuiLiu
Xihui Liu
2 months
Our part-aware 3D generation work, OmniPart, is accepted by Siggraph Asia 2025. Code and model released! Paper: https://t.co/vEAyV5kqD2 Project page: https://t.co/ovnAysSa7I Code: https://t.co/dToyRki7R8 Demo: https://t.co/9gcBmo2NdP
1
40
299
@RishubhParihar
Rishubh Parihar
2 months
“Make it red.” “No! More red!” “Ughh… slightly less red.” “Perfect!” ♥️ 🎚️Kontinuous Kontext adds slider-based control over edit strength to instruction-based image editing, enabling smooth, continuous transformations!
16
36
158
@keshigeyan
Keshigeyan Chandrasegaran
3 months
Super excited about this line of work! 🚀 A simple, scalable recipe for training diffusion language models using autoregressive models. We're releasing our tech report, model weights, and inference code!
@RadicalNumerics
Radical Numerics
3 months
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
0
9
68
@cogphilosopher
Imran Thobani
3 months
1/x Our new method, the Inter-Animal Transform Class (IATC), is a principled way to compare neural network models to the brain. It's the first to ensure both accurate brain activity predictions and specific identification of neural mechanisms. Preprint: https://t.co/hPqo5PrZoc
3
14
46
@recursus
Daniel Bear
3 months
These are the most impressive examples of physical understanding I've seen from a computer vision model (let alone one that's not hooked up to an LLM.) And IMO the first good explanation of *how* physical understanding can arise without supervision.
@KlemenKotar
Klemen Kotar
3 months
PSI enables some cool zero-shot applications: visual Jenga, physical video editing, and motion estimation for robotics.
0
2
12
@KlemenKotar
Klemen Kotar
3 months
1/ A good world model should be promptable like an LLM, offering flexible control and zero-shot answers to many questions. Language models have benefited greatly from this fact, but it's been slow to come to vision. We introduce PSI: a path to truly interactive visual world
3
35
131
@KlemenKotar
Klemen Kotar
3 months
PSI enables some cool zero-shot applications: visual Jenga, physical video editing, and motion estimation for robotics.
1
2
13
@Rahul_Venkatesh
Rahul Venkatesh
3 months
(4/) If you'd like to explore this more, we provide code and models here with detailed documentation:
github.com
Contribute to neuroailab/SpelkeNet development by creating an account on GitHub.
0
0
1
@Rahul_Venkatesh
Rahul Venkatesh
3 months
(3/) It also turns out that our segments are really useful for complex physical object manipulation.
1
0
0
@Rahul_Venkatesh
Rahul Venkatesh
3 months
(2/) This capability helps discover more physically meaningful object segments, compared to those from state-of-the-art models like SegmentAnything (SAM).
1
0
1
@Rahul_Venkatesh
Rahul Venkatesh
3 months
(1/) Once optical flow is integrated, it allows us to interact with the scene through virtual pokes.
1
0
1
@Rahul_Venkatesh
Rahul Venkatesh
3 months
Excited to share PSI — our new framework for building pure vision foundation world models via Probabilistic Structure Integration! World models that rely on language conditioning often fall short in enabling physical interaction. Integrating structured signals like optical flow
@KlemenKotar
Klemen Kotar
3 months
1/ A good world model should be promptable like an LLM, offering flexible control and zero-shot answers to many questions. Language models have benefited greatly from this fact, but it's been slow to come to vision. We introduce PSI: a path to truly interactive visual world
1
3
4
@dyamins
Daniel Yamins
3 months
Here is our best thinking about how to make world models. I would apologize for it being a massive 40-page behemoth, but it's worth reading.
@KlemenKotar
Klemen Kotar
3 months
1/ A good world model should be promptable like an LLM, offering flexible control and zero-shot answers to many questions. Language models have benefited greatly from this fact, but it's been slow to come to vision. We introduce PSI: a path to truly interactive visual world
5
41
220
@GretaTuckute
Greta Tuckute
4 months
Humans largely learn language through speech. In contrast, most LLMs learn from pre-tokenized text. In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech. Poster P6, Aug 19 at 13:30, Foyer 2.2!
8
32
194
@judyefan
Judy Fan
5 months
Thrilled to welcome members of @cogsci_soc to the SF/Bay area for #CogSci2025 this week! Here's a preview of what the Cognitive Tools Lab 🧠🛠️ @Stanford @StanfordPsych will be presenting!
3
16
149
@GordonWetzstein
Gordon Wetzstein
5 months
🚀 Just published in Nature Photonics: synthetic aperture waveguide holography—a new path toward ultra-thin, high-quality 3D mixed reality displays. 📄 https://t.co/sjX7HpqvrT #Photonics #Holography #MR 1/5
10
65
362
@Rahul_Venkatesh
Rahul Venkatesh
5 months
I’m one of those people who still enjoys the archaic thrill of coding without AI tools—just me and the editor. But I recently tried Anycoder by @_akhaliq and was genuinely impressed. You describe it and it builds the app you want, and deploys on huggingface:
0
1
14
@_akhaliq
AK
5 months
Discovering and using Spelke segments
1
3
39