Thomas O'Connell @thomaspocon X Profile

Thomas O'Connell

@thomaspocon

Followers

244

Following

838

Media

11

Statuses

79

cognitive neuroscientist @KanwisherLab, @mitcocosci

Cambridge, MA

Joined January 2017

Don't wanna be here? Send us removal request.

Thomas O'Connell

@thomaspocon

2 years

🔥Preprint Alert🔥 Excited to share some new work modeling human 3D shape perception!. “Approaching human 3D shape perception with neurally mappable models”. w/ @tylerraye, @_yonifriedman, @_atewari, Josh Tenenbaum, @vincesitzmann, @Nancy_Kanwisher . 🧵

4

50

156

Thomas O'Connell

@thomaspocon

8 months

RT @patrickmineault: Excited to release what we’ve been working on at Amaranth Foundation, our latest whitepaper, NeuroAI for AI safety! A….

0

97

0

Thomas O'Connell

@thomaspocon

10 months

RT @tylerraye: do large-scale vision models represent the 3D structure of objects?. excited to share our benchmark: multiview object consis….

0

88

0

Thomas O'Connell

@thomaspocon

2 years

RT @sucholutsky: 🧵🎉 Our new preprint is up, and we’d love your feedback! We're "Getting Aligned on Representational Alignment" - the degree….

0

119

0

Thomas O'Connell

@thomaspocon

2 years

Thanks for reading! Please be in touch with any questions, ideas, desires to chat, etc!.

0

2

Thomas O'Connell

@thomaspocon

2 years

Finally, check out related work from my awesome co-author @tylerraye implicating the medial temporal lobe in supporting human 3D inferences. He suggests computations beyond standard DNNs are needed to model this process. Hmmm potential synergy? 🤔.

tyler bonnen

@tylerraye

2 years

excited to share my last experimental project from graduate school 😅🥹. "Medial temporal cortex supports compositional visual inferences" . with my PhD advisors Anthony Wagner & Dan Yamins. 1/🧠.

1

0

6

Thomas O'Connell

@thomaspocon

2 years

Shout out to @KordingLab for tweeting out the CCN-version of the preprint! If you downloaded the earlier version, there are some minor updates in the one linked in this thread (addition of Stylized-ImageNet models, additional discussion).

Kording Lab 🦖

@KordingLab

2 years

Replicating human 3d shape perception abilities - our field is changing so fast, I am having whiplash:

1

0

3

Thomas O'Connell

@thomaspocon

2 years

If watching talks is more your speed, check out our presentation from CCN 2023.

1

0

8

Thomas O'Connell

@thomaspocon

2 years

tldr: DNNs trained with multi-view objectives make human-aligned 3D shape judgements! Lots more work to do here (biologically-plausible learning, generalization) to reach full human capabilities, but we take an important step closing the gap between human and model 3D inferences.

1

0

5

Thomas O'Connell

@thomaspocon

2 years

So, are we done? Not quite…none of these models generalize well to novel categories not included in their training set. Closing this generalization gap will be a focus of future research

1

0

5

Thomas O'Connell

@thomaspocon

2 years

Remarkably, even a standard resnet50 CNN architecture showed a marked jump in alignment to humans when trained with a multi-view learning objective (Multi-View CNN).

1

0

3

Thomas O'Connell

@thomaspocon

2 years

Models trained with a 3D multi-view objective (Multi-View Autoencoder & Multi-View CNN), in which two images depicting the same object from different viewpoints must be associated, were markedly more aligned to humans, approaching the performance of the 3D LFNs!

1

0

6

Thomas O'Connell

@thomaspocon

2 years

We rule out training on rendered shapes (all control models), generative capabilities (Autoencoder), and viewpoint supervision (Autoencoder+Viewpoints) as sufficient for learning human-aligned 3D shape representations (see plot below).

1

0

4

Thomas O'Connell

@thomaspocon

2 years

Now for the fun part! What drives alignment between 3D LFNs and humans? To figure this out, we trained a series of control models, each incorporating different aspects of 3D LFNs into more standard architectures.

1

0

4

Thomas O'Connell

@thomaspocon

2 years

To ensure these results aren’t driven by the category structure of ShapeNet, we repeat the procedure using abstract procedurally-generated shapes (top) created with the ShapeGenerator plugin for Blender. Again, we the 3D LFN is most aligned to human 3D shape judgements (bottom)

1

0

4

Thomas O'Connell

@thomaspocon

2 years

Next, we construct adversarial match-to-sample trials by minimizing the accuracy for 25 ImageNet CNNs and selecting trials across 5 difficulty conditions. Even for these adversarially-defined trials, alignment between human and 3D LFN 3D shape judgements holds

1

0

4

Thomas O'Connell

@thomaspocon

2 years

….and find that features from the 3D LFN (yellow, pink) are much more aligned to human 3D shape judgements than our baseline models for random within-category pairs of objects!

1

0

4

Thomas O'Connell

@thomaspocon

2 years

We train a 3D Light Field Network (@vincesitzmann) on ShapeNet using a multi-view rendering objective….

1

0

5

Thomas O'Connell

@thomaspocon

2 years

Much progress has been driven by 3D neural fields, which learn the continuous function defining the shape of an object. We focus on conditional neural fields that compute 3D shape from images, rather than NeRF-style models that optimize directly on many viewpoints of one scene.

1

0

3

Thomas O'Connell

@thomaspocon

2 years

But should we expect DNNs trained on large corpora of natural images to incidentally learn 3D shape? Recent progress in 3D graphics and computer vision suggests not and that additional inductive biases are necessary to capture 3D geometry. .

1

0

3

Thomas O'Connell

@thomaspocon

2 years

For standard DNNs, we use 25 ImageNet CNNs, 25 ImageNet ViTs, and 3 Stylized-ImageNet CNNs (Geirhos et al. 2019). We evaluate accuracy (x-axis) and trial-wise similarity to humans (y-axis). While humans perform well, standard DNNs struggle to make human-like 3D inferences

1

0

3