Thomas O'Connell
@thomaspocon
Followers
245
Following
824
Media
11
Statuses
79
cognitive neuroscientist @KanwisherLab, @mitcocosci
Cambridge, MA
Joined January 2017
🔥Preprint Alert🔥 Excited to share some new work modeling human 3D shape perception! “Approaching human 3D shape perception with neurally mappable models” w/ @tylerraye, @_yonifriedman, @_atewari, Josh Tenenbaum, @vincesitzmann, @Nancy_Kanwisher
https://t.co/uLMC76i4TP 🧵
4
50
154
Excited to release what we’ve been working on at Amaranth Foundation, our latest whitepaper, NeuroAI for AI safety! A detailed, ambitious roadmap for how neuroscience research can help build safer AI systems while accelerating both virtual neuroscience and neurotech. 1/N
18
100
367
do large-scale vision models represent the 3D structure of objects? excited to share our benchmark: multiview object consistency in humans and image models (MOCHI) with @xkungfu @YutongBAI1002 @thomaspocon @_yonifriedman @Nancy_Kanwisher Josh Tenenbaum and Alexei Efros 1/👀
13
88
409
🧵🎉 Our new preprint is up, and we’d love your feedback! We're "Getting Aligned on Representational Alignment" - the degree to which internal representations of different (biological & artificial) information processing systems agree. 🧠🤖🔬🔍 #CognitiveScience #Neuroscience #AI
7
119
436
Thanks for reading! Please be in touch with any questions, ideas, desires to chat, etc!
0
0
2
Finally, check out related work from my awesome co-author @tylerraye implicating the medial temporal lobe in supporting human 3D inferences. He suggests computations beyond standard DNNs are needed to model this process. Hmmm potential synergy? 🤔 https://t.co/OOnfqDeX3e
excited to share my last experimental project from graduate school 😅🥹 "Medial temporal cortex supports compositional visual inferences" with my PhD advisors Anthony Wagner & Dan Yamins 1/🧠 https://t.co/XvcFuNqlSb
1
0
6
Shout out to @KordingLab for tweeting out the CCN-version of the preprint! If you downloaded the earlier version, there are some minor updates in the one linked in this thread (addition of Stylized-ImageNet models, additional discussion) https://t.co/KrZzKPUqG7
Replicating human 3d shape perception abilities - our field is changing so fast, I am having whiplash:
1
0
3
If watching talks is more your speed, check out our presentation from CCN 2023 https://t.co/rgIeamg6Sc
1
0
8
tldr: DNNs trained with multi-view objectives make human-aligned 3D shape judgements! Lots more work to do here (biologically-plausible learning, generalization) to reach full human capabilities, but we take an important step closing the gap between human and model 3D inferences
1
0
5
So, are we done? Not quite…none of these models generalize well to novel categories not included in their training set. Closing this generalization gap will be a focus of future research
1
0
5
Remarkably, even a standard resnet50 CNN architecture showed a marked jump in alignment to humans when trained with a multi-view learning objective (Multi-View CNN)
1
0
3
Models trained with a 3D multi-view objective (Multi-View Autoencoder & Multi-View CNN), in which two images depicting the same object from different viewpoints must be associated, were markedly more aligned to humans, approaching the performance of the 3D LFNs!
1
0
6
We rule out training on rendered shapes (all control models), generative capabilities (Autoencoder), and viewpoint supervision (Autoencoder+Viewpoints) as sufficient for learning human-aligned 3D shape representations (see plot below)
1
0
4
Now for the fun part! What drives alignment between 3D LFNs and humans? To figure this out, we trained a series of control models, each incorporating different aspects of 3D LFNs into more standard architectures
1
0
4
To ensure these results aren’t driven by the category structure of ShapeNet, we repeat the procedure using abstract procedurally-generated shapes (top) created with the ShapeGenerator plugin for Blender. Again, we the 3D LFN is most aligned to human 3D shape judgements (bottom)
1
0
4
Next, we construct adversarial match-to-sample trials by minimizing the accuracy for 25 ImageNet CNNs and selecting trials across 5 difficulty conditions. Even for these adversarially-defined trials, alignment between human and 3D LFN 3D shape judgements holds
1
0
4
….and find that features from the 3D LFN (yellow, pink) are much more aligned to human 3D shape judgements than our baseline models for random within-category pairs of objects!
1
0
4
We train a 3D Light Field Network (@vincesitzmann) on ShapeNet using a multi-view rendering objective….
1
0
5
Much progress has been driven by 3D neural fields, which learn the continuous function defining the shape of an object. We focus on conditional neural fields that compute 3D shape from images, rather than NeRF-style models that optimize directly on many viewpoints of one scene
1
0
3
But should we expect DNNs trained on large corpora of natural images to incidentally learn 3D shape? Recent progress in 3D graphics and computer vision suggests not and that additional inductive biases are necessary to capture 3D geometry...
1
0
3
For standard DNNs, we use 25 ImageNet CNNs, 25 ImageNet ViTs, and 3 Stylized-ImageNet CNNs (Geirhos et al. 2019). We evaluate accuracy (x-axis) and trial-wise similarity to humans (y-axis). While humans perform well, standard DNNs struggle to make human-like 3D inferences
1
0
3