
Prafull Sharma
@prafull7
Followers
1K
Following
4K
Media
34
Statuses
352
World models, Computer Vision, Graphics, AI PostDoc @MIT with Josh Tenenbaum and Phillip Isola PhD @MIT with Bill Freeman and Fredo Durand Undergrad @Stanford
Cambridge, MA
Joined September 2010
Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: https://t.co/KY4fjNeCBd Unpaired rep learning: https://t.co/vJTMoyJj5J 1/9
9
115
669
A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. How can we build machines to do so? In our new preprint, we propose that any general intelligence system must have an adaptive world model,
14
104
506
Our computer vision textbook is now available for free online here: https://t.co/ERy2Spc7c2 We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!
visionbook.mit.edu
35
620
3K
Imagine a Van Gogh-style teapot turning into glass with one simple sliderπ¨ Introducing MARBLE, material edits by simply changing CLIP embedding! π https://t.co/VOHGwUGFVZ π Internship project with @prafull7, @markb_boss , @jampani_varun at @StabilityAI
1
5
25
Image-text alignment is hard β especially as multimodal data gets more detailed. Most methods rely on human labels or proprietary feedback (e.g., GPT-4V). We introduce: 1. CycleReward: a new alignment metric focused on detailed captions, trained without human supervision. 2.
4
38
197
Excited to share our position paper on the Fractured Entangled Representation (FER) Hypothesis! We hypothesize that the standard paradigm of training networks today β while producing impressive benchmark results β is still failing to create a well-organized internal
Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known
5
38
246
Excited to share our ICLR 2025 paper, I-Con, a unifying framework that ties together 23 methods across representation learning, from self-supervised learning to dimensionality reduction and clustering. Website: https://t.co/QD6OciHzmt A thread π§΅ 1/n
1
24
92
I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)
13
135
1K
We wrote a new video diffusion paper! @kiwhansong0 and @BoyuanChen0 and co-authors did absolutely amazing work here. Apart from really working, the method of "variable-length history guidance" is really cool and based on some deep truths about sequence generative modeling....
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: https://t.co/wdZ19yCgjJ (1/7)
3
13
124
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper
1K
3K
30K
As a kid I was fascinated the Search for Extraterrestrial Intelligence, SETI Now we live in an era when it's becoming meaningful to search for "extraterrestrial life" not just in our universe but in simulated universes as well This project provides new tools toward that dream:
Introducing ASAL: Automating the Search for Artificial Life with Foundation Models https://t.co/uUq63UNrjv Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our
4
19
215
Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? π§΅
10
67
482
Had a lot of fun working on this. Stay tuned for more research on how human listeners reverse engineer the physics of the world using the sounds they hear
We just wrote a primer on how the physics of sound constrains auditory perception: https://t.co/NLgb4Q1ixj Covers sound propagation and object interactions, and touches on their relevance to music and film. I enjoyed working on this with @vin_agarwal and James Traer.
1
4
30
We just wrote a primer on how the physics of sound constrains auditory perception: https://t.co/NLgb4Q1ixj Covers sound propagation and object interactions, and touches on their relevance to music and film. I enjoyed working on this with @vin_agarwal and James Traer.
5
38
124
What happens when models see the world as humans do? In our #NeurIPS2024 paper we show that aligning to human perceptual preferences can *improve* general-purpose representations! π: https://t.co/IPfJUos2O5 π: https://t.co/RWjqXmfUiy π»: https://t.co/XsoJ2cbYDA (1/n)
8
85
454
.@MIT_CSAIL PhD student Marianne Rakic's most recent project, Tyche, is a medical image segmentation model that aims at generalizing new tasks & capturing uncertainty in the medical image. Learn more about Marianne and her recent projects:
cap.csail.mit.edu
0
4
7
Sakana AI announces The AI Scientist Towards Fully Automated Open-Ended Scientific Discovery discuss: https://t.co/JsqbVgcLHz One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new
8
84
381
New paper and pip package: modula: "Scalable Optimization in the Modular Norm" π¦ https://t.co/ztWVPShp1p π https://t.co/UnVL9iY8kB We re-wrote the @pytorch module tree so that training automatically scales across width and depth.
8
37
176
Presenting a novel approach that harnesses generative text-to-image models to enable users to precisely edit specific material properties (like roughness and transparency) of objects in images while retaining their original shape. Learn more β https://t.co/hF9nkgj3WP
27
102
403
Introducing Diffusion Forcing, which unifies next-token prediction (eg LLMs) and full-seq. diffusion (eg SORA)! It offers improved performance & new sampling strategies in vision and robotics, such as stable, infinite video generation, better diffusion planning, and more! (1/8)
12
214
1K