Prafull Sharma @prafull7 X Profile

Prafull Sharma

@prafull7

Followers

1K

Following

4K

Media

34

Statuses

352

World models, Computer Vision, Graphics, AI PostDoc @MIT with Josh Tenenbaum and Phillip Isola PhD @MIT with Bill Freeman and Fredo Durand Undergrad @Stanford

https://t.co/O8G0HBoa77

Cambridge, MA

Joined September 2010

Don't wanna be here? Send us removal request.

Prafull Sharma

@prafull7

1 year

Graduated with a PhD in Computer Science @MIT! Grateful to my advisors and teachers who helped me learn and grow in this journey! Thanks to all my friends and family members for their support.

117

46

2K

Phillip Isola

@phillip_isola

6 days

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: https://t.co/KY4fjNeCBd Unpaired rep learning: https://t.co/vJTMoyJj5J 1/9

9

115

669

Lance Ying

@LanceYing42

3 months

A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. How can we build machines to do so? In our new preprint, we propose that any general intelligence system must have an adaptive world model,

14

104

506

Phillip Isola

@phillip_isola

4 months

Our computer vision textbook is now available for free online here: https://t.co/ERy2Spc7c2 We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

visionbook.mit.edu

35

620

3K

Ta-Ying Cheng

@ChengTim0708

4 months

Imagine a Van Gogh-style teapot turning into glass with one simple slider🎨 Introducing MARBLE, material edits by simply changing CLIP embedding! 🔗 https://t.co/VOHGwUGFVZ 👏 Internship project with @prafull7, @markb_boss , @jampani_varun at @StabilityAI

1

5

25

Hyojin Bahng

@hyojinbahng

4 months

Image-text alignment is hard — especially as multimodal data gets more detailed. Most methods rely on human labels or proprietary feedback (e.g., GPT-4V). We introduce: 1. CycleReward: a new alignment metric focused on detailed captions, trained without human supervision. 2.

4

38

197

Akarsh Kumar

@akarshkumar0101

5 months

Excited to share our position paper on the Fractured Entangled Representation (FER) Hypothesis! We hypothesize that the standard paradigm of training networks today — while producing impressive benchmark results — is still failing to create a well-organized internal

Kenneth Stanley

@kenneth0stanley

5 months

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known

5

38

246

Shaden

@Sa_9810

6 months

Excited to share our ICLR 2025 paper, I-Con, a unifying framework that ties together 23 methods across representation learning, from self-supervised learning to dimensionality reduction and clustering. Website: https://t.co/QD6OciHzmt A thread 🧵 1/n

1

24

92

Jeremy Bernstein

@jxbz

7 months

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

13

135

1K

Vincent Sitzmann

@vincesitzmann

8 months

We wrote a new video diffusion paper! @kiwhansong0 and @BoyuanChen0 and co-authors did absolutely amazing work here. Apart from really working, the method of "variable-length history guidance" is really cool and based on some deep truths about sequence generative modeling....

Boyuan Chen

@BoyuanChen0

8 months

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: https://t.co/wdZ19yCgjJ (1/7)

3

13

124

Andrej Karpathy

@karpathy

9 months

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper

1K

3K

30K

Phillip Isola

@phillip_isola

10 months

As a kid I was fascinated the Search for Extraterrestrial Intelligence, SETI Now we live in an era when it's becoming meaningful to search for "extraterrestrial life" not just in our universe but in simulated universes as well This project provides new tools toward that dream:

Sakana AI

@SakanaAILabs

10 months

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models https://t.co/uUq63UNrjv Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our

4

19

215

Shivam Duggal

@ShivamDuggal4

11 months

Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? 🧵

10

67

482

Vin Agarwal

@vin_agarwal

1 year

Had a lot of fun working on this. Stay tuned for more research on how human listeners reverse engineer the physics of the world using the sounds they hear

Josh McDermott

@JoshHMcDermott

1 year

We just wrote a primer on how the physics of sound constrains auditory perception: https://t.co/NLgb4Q1ixj Covers sound propagation and object interactions, and touches on their relevance to music and film. I enjoyed working on this with @vin_agarwal and James Traer.

1

4

30

Josh McDermott

@JoshHMcDermott

1 year

We just wrote a primer on how the physics of sound constrains auditory perception: https://t.co/NLgb4Q1ixj Covers sound propagation and object interactions, and touches on their relevance to music and film. I enjoyed working on this with @vin_agarwal and James Traer.

5

38

124

Shobhita Sundaram

@shobsund

1 year

What happens when models see the world as humans do? In our #NeurIPS2024 paper we show that aligning to human perceptual preferences can *improve* general-purpose representations! 📝: https://t.co/IPfJUos2O5 🌐: https://t.co/RWjqXmfUiy 💻: https://t.co/XsoJ2cbYDA (1/n)

8

85

454

MIT CSAIL Alliances

@csail_alliances

1 year

.@MIT_CSAIL PhD student Marianne Rakic's most recent project, Tyche, is a medical image segmentation model that aims at generalizing new tasks & capturing uncertainty in the medical image. Learn more about Marianne and her recent projects:

cap.csail.mit.edu

0

4

7

AK

@_akhaliq

1 year

Sakana AI announces The AI Scientist Towards Fully Automated Open-Ended Scientific Discovery discuss: https://t.co/JsqbVgcLHz One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new

8

84

381

Jeremy Bernstein

@jxbz

1 year

New paper and pip package: modula: "Scalable Optimization in the Modular Norm" 📦 https://t.co/ztWVPShp1p 📝 https://t.co/UnVL9iY8kB We re-wrote the @pytorch module tree so that training automatically scales across width and depth.

8

37

176

Google AI

@GoogleAI

1 year

Presenting a novel approach that harnesses generative text-to-image models to enable users to precisely edit specific material properties (like roughness and transparency) of objects in images while retaining their original shape. Learn more → https://t.co/hF9nkgj3WP

27

102

403

Boyuan Chen

@BoyuanChen0

1 year

Introducing Diffusion Forcing, which unifies next-token prediction (eg LLMs) and full-seq. diffusion (eg SORA)! It offers improved performance & new sampling strategies in vision and robotics, such as stable, infinite video generation, better diffusion planning, and more! (1/8)

12

214

1K