KonstantinWille Profile Banner
Konstantin Willeke Profile
Konstantin Willeke

@KonstantinWille

Followers
1K
Following
22K
Media
11
Statuses
221

@Stanford

Joined November 2013
Don't wanna be here? Send us removal request.
@KonstantinWille
Konstantin Willeke
3 years
New paper out now in @Nature🥳. We use CNNs, population recording of mouse V1 & pharmacology to show how feature tuning in visual cortex changes /w internal brain state. Shared work with @kfrankelab . Fantastic collaboration between @sinzlab & @AToliasLab.
Tweet card summary image
nature.com
Nature - Computational modelling and functional imaging of awake, active mice show that behaviour directly changes neuronal tuning in the visual cortex through pupil dilation.
@kfrankelab
Katrin Franke
3 years
So excited this is out @Nature 🎉. Pupil dilation with internal brain state rapidly shifts neural feature/color selectivity in mouse visual cortex by switching between rod&cone photoreceptors👁️🧠. Shared work with @KonstantinWille, @AToliasLab & @sinzlab
5
15
97
@KonstantinWille
Konstantin Willeke
7 days
RT @kellerjordan0: New NanoGPT training speed record: 3.28 FineWeb val loss in 2.966 minutes on 8xH100. New record-holder: Vishal Agrawal (….
0
39
0
@KonstantinWille
Konstantin Willeke
2 months
For anyone making it this far in the thread: We're training frontier models on 1T++ tokens of brain data, and we're looking for engineers and scientists 🧠 DM's are open!. [7/6] fin.
2
3
23
@KonstantinWille
Konstantin Willeke
2 months
Thanks to the enigma ml team: @AdrianoCardace @alexrgil14 @AtakanBedel @vedanglad 💪. Special thanks to our partner @mlfoundry for the support and their affordable H100 nodes! 💙. [6/6].
2
1
17
@KonstantinWille
Konstantin Willeke
2 months
We also use torch's asynchronous all_reduce operation, so that we can do useful training housekeeping while we wait for the gradients to arrive, e.g. logging and stepping of schedulers. [5/6].
1
0
6
@KonstantinWille
Konstantin Willeke
2 months
Our new method is very simple. We first build buckets with ideal sizes for current hardware, and fill them up with parameters from 10-20 layers. Then, after the backwards() call is complete, we only have to communicate ~10 buckets, with 64MB worth of gradients each. [4/6]
Tweet media one
1
0
13
@KonstantinWille
Konstantin Willeke
2 months
@AToliasLab @naturecomputes The previous elegant state of the art by @YouJiacheng bypasses torch DDP, and manually reduces the gradients after the backwards is completed, resulting in a significant efficiency gain. However, the gradients of each layer are communicated for each layer independently.
Tweet media one
1
0
14
@KonstantinWille
Konstantin Willeke
2 months
@AToliasLab @naturecomputes In torch's DDP, as the gradients get computed during the backwards() call, they are communicated across all devices via all_reduce. However, when using torch.compile, this overlap of communication and computation is not as efficient as it can be. [2/6]
Tweet media one
1
1
15
@KonstantinWille
Konstantin Willeke
2 months
New NanoGPT training speed world record from the Enigma Project 🎉 (@AToliasLab, @naturecomputes, . We improve the efficiency of gradient all_reduce. Short explainer of our method 👇. [1/6].
@kellerjordan0
Keller Jordan
2 months
New NanoGPT training speed record: 3.28 FineWeb val loss in 2.990 minutes on 8xH100. Previous record: 3.014 minutes (1.44s slower).Changelog: Accelerated gradient all-reduce. New record-holders: @KonstantinWille et al. of The Enigma project
Tweet media one
Tweet media two
1
17
154
@KonstantinWille
Konstantin Willeke
2 months
RT @MoonL620922: (1/6) Thrilled to share our triple-N dataset (Non-human Primate Neural Responses to Natural Scenes)! It captures thousands….
0
43
0
@KonstantinWille
Konstantin Willeke
3 months
RT @naturecomputes: Enigma's excited to co-sponsor Foundry's AI for Science Symposium in San Francisco on May 16th. Join us for an exciting….
0
5
0
@KonstantinWille
Konstantin Willeke
3 months
RT @AToliasLab: After 7 years, thrilled to finally share our #MICrONS functional connectomics results!. We recorded activity from ~75K neur….
0
95
0
@KonstantinWille
Konstantin Willeke
4 months
RT @dyamins: New paper on 3D scene understanding for static images with a novel large-scale video prediction model. .
0
8
0
@KonstantinWille
Konstantin Willeke
4 months
RT @KlemenKotar: 🚀 Excited to share our new paper!. We introduce the first autoregressive model that natively handles:.🎥 Novel view synthes….
0
5
0
@KonstantinWille
Konstantin Willeke
4 months
RT @dyamins: New paper on self-supervised optical flow and occlusion estimation from video foundation models. @sstj389 @jiajunwu_cs @SeKim….
0
18
0
@KonstantinWille
Konstantin Willeke
6 months
RT @AToliasLab: With Nvidia’s shares plummeting by ~17% and other big tech taking hits, today alone saw a market value loss exceeding $589….
0
11
0
@KonstantinWille
Konstantin Willeke
7 months
RT @naturecomputes: Enigma's hosting a post-workshop social with @newtheoryai tonight: NEURREPS x UNIREPS X NEUROAI 7-11pm. RSVP link below….
0
6
0
@KonstantinWille
Konstantin Willeke
7 months
Come find us at NeurIPS to chat about large scale multimodal models of the brain.
@naturecomputes
Sophia Sanborn
7 months
Enigma's at NeurIPS - come find us. New jobs posted on our website for research scientists & engineers in multi-modal modeling & interpretability.
Tweet media one
Tweet media two
0
2
22
@KonstantinWille
Konstantin Willeke
8 months
RT @karpathy: People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically….
0
2K
0
@KonstantinWille
Konstantin Willeke
8 months
RT @exnx: Evo has been published in @Science! A true privilege to work with such an amazing team!. So many exciting new experimental result….
0
55
0
@KonstantinWille
Konstantin Willeke
8 months
RT @orvieto_antonio: Only a few more days to apply to our amazing PhD program!.
0
6
0