Vidit Goel
@ViditGoel7
Followers
480
Following
4K
Media
18
Statuses
307
GenerativeML and 3D @Snap prev @PicsartAI | @IITKGP '21 | Computer Vision, Deep learning
New York, USA
Joined November 2018
Check our latest updates and improved model for PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor🚀🚀 Project page: https://t.co/6aIxv2MAy2 ArXiv: https://t.co/ShNOGF7Ntz We show that 👇👇
PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models @Gradio demo is out on @huggingface Spaces demo: https://t.co/hR63tPMl5i
3
25
105
Possibly, we can only store high level information like semantics and also some 3D representation in efficient manner and use generative models to decode them to high quality 3D representation of the world where our system can take actions.
0
0
0
Even with this large storage representation is highly important. If we record 30 fps video in 512 x 512 resolution we will only be able to record ~1-2 year of data even if we use whole of storage of brain 1-2 PB. Generative models might be useful for compressing data efficiently
1
0
0
Interesting read. Further, I recently noticed that though the brain only takes 20W of energy, it can have 1-2 petabytes of storage! Moving forward I think we should relax some constraints on long term memory a world model can store.
Representation representation representation #SpatialAI See the SLAM Handbook Chapter 18 for my views! https://t.co/EdTa9zcl5F
1
0
14
This morning on the way to school, my 8-year-old daughter and I talked about fame and impact. We started with MrBeast and internet celebrities—whose work she knows well—but then I introduced Einstein, whose discoveries shaped the technology we use every day. Her big question:
1
1
26
RT @humphrey_shi: Multi-agent coding systems (e.g., Claude Code) are sweeping the world like a storm this summer. The success rests on a si…
0
1
0
Hi all, I will be at CVPR in Nashville from 10-15 June. Lets meet! Also drop by our paper Wonderland: Navigating 3D Scenes from a Single Image https://t.co/Qe43iqeNrJ When - Friday morning session Where - ExHall D Poster #59
#CVPR25
snap-research.github.io
Wonderland: Navigating 3D Scenes from a Single Image
1
0
2
talk by @jon_barron . Completely agree, further if we move towards spatial computing 3D would be definitely needed but again a long term bet. Full talk: https://t.co/7tUzEKrA3K
0
0
2
Wondering what's happening with NATTEN in 2025? Check out Generalized Neighborhood Attention! Spoiler: NATTEN gets a new stride parameter, we made a simulator for all your analytical studies, AND a Blackwell kernel! Keep reading for more... (1 / 5)
1
6
27
Hi @CVPR, all of my saved reviews for all the paper are deleted. Can you please look into it? It would be very difficult for me to write them again in next few days
1
0
1
https://t.co/yPqUUUt3M9 They shows that we can represent fine-details of identity in a SINGLE token. The key idea is to extract query dependent values from the learned single token using attention and then use it in Cross-Attention, hence the name Nested Attention.
0
0
6
Introducing, The Heist. Every shot was done via text-to video with Google Veo 2. I did all the sound design, edit,ing and music. I can't wait to show you what's in store next year at @secret__level! 4K version here: https://t.co/oLgxWxYKCC
494
806
7K
I have been trying to find a specific library that I used but cant remember the name. I just described what the library did in https://t.co/T4wK7V1nCC and violla, within secs had the exact github link of library that I was looking for. Can't image my work life without such tools
perplexity.ai
Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
0
0
2
Check out this amazing work! Now, you can precisely control the sequence of events using a video generation model.
📢MinT: Temporally-Controlled Multi-Event Video Generation📢 https://t.co/gEnm4DnkAC TL;DR: We identify a fundamental failure mode of existing video generators: they cannot produce videos with sequential events. MinT unlocks this capability with temporal grounding of events. 🧵
0
0
1
Introducing 3D Capture in latest Lens Studio 🎉 . Convert any object video capture to splats and create AR experience using Snapchat Lens. Use it for fun, advertising and much more #GaussianSplatting #snapchat
New in Lens Studio: 3D Capture! 🎉 Take a video of any real life object and Lens Studio will reconstruct it as a Gaussian splat to use in your Lenses. Download the latest version of Lens Studio and start building now: https://t.co/B01USJitGF
0
0
6
We present Cavia, the first framework that enables users to generate multiple videos of the same scene with precise control over camera motion, while simultaneously preserving object motion. ✨ https://t.co/id9p1OKlti (1/9)
3
25
138
LLMs when used in real life 🙃
0
1
3
LLMs when used in real life 🙃
0
1
3
5. VideoGen would help to get 3D representation easily rather than relying on skilled captures. What are your thoughts?
0
0
4
3. AR platforms such as @Snap would have benefit from 3D 4. Interactive scenes like games would definitely benefit from 3D. Rendering every view point using VideoGen is so inefficient compared to having few MB of asset that can be rendered from infinite camera trajectories
1
0
4