Luke McDermott Profile Banner
Luke McDermott Profile
Luke McDermott

@lukemcdermotttt

Followers
312
Following
531
Media
10
Statuses
66

AI Researcher at Modern Intelligence, focusing on Efficient Deep Learning & Adaptive AI | Incoming PhD @UCSanDiego

San Diego, CA
Joined May 2023
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@lukemcdermotttt
Luke McDermott
2 months
Wooooh! Happy to officially say I’ll be staying in San Diego for a PhD in ML & Data Science @ UCSD ECE
Tweet media one
4
0
47
@lukemcdermotttt
Luke McDermott
1 month
update: pushed changes to the bio
Tweet media one
@lukemcdermotttt
Luke McDermott
1 month
My PhD acceptance is in superposition
Tweet media one
Tweet media two
17
24
2K
16
51
4K
@lukemcdermotttt
Luke McDermott
1 month
My PhD acceptance is in superposition
Tweet media one
Tweet media two
17
24
2K
@lukemcdermotttt
Luke McDermott
1 month
@herbschang It’s been my dream ever since I was a little quark
0
0
180
@lukemcdermotttt
Luke McDermott
1 month
@AgPedram This was actually after I attended the workshop haha
1
0
160
@lukemcdermotttt
Luke McDermott
8 months
@kellerjordan0 Random features are quite powerful still, you can get near optimal on ResNet-18/CIFAR-10 (95% acc) by only training the first few layers + last layer, roughly 15% of the weights. The gradients are only large for the first and last layer in this setting.
1
0
9
@lukemcdermotttt
Luke McDermott
4 months
First time in Vancouver & first spotlight presentation @ AI2ASE #AAAI2024
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
1
10
@lukemcdermotttt
Luke McDermott
6 months
Thanks to everyone who came by #NeurIPS2023 & @unireps
Tweet media one
@lukemcdermotttt
Luke McDermott
6 months
I’ll be at #NeurIPS2023 presenting two works at UniReps! I’ll be there all week, so I’m looking to meet up to find collaborators, PhD positions, or just talk about new research. Feel free to reach out. Links to the papers below ⬇️
1
0
7
0
2
10
@lukemcdermotttt
Luke McDermott
6 months
I’ll be at #NeurIPS2023 presenting two works at UniReps! I’ll be there all week, so I’m looking to meet up to find collaborators, PhD positions, or just talk about new research. Feel free to reach out. Links to the papers below ⬇️
@lukemcdermotttt
Luke McDermott
8 months
Happy to announce that my paper “Linear Mode Connectivity in Sparse Neural Networks” was accepted at NeurIPS 2023’s UniReps workshop. link: 1/n
Tweet media one
1
1
6
1
0
7
@lukemcdermotttt
Luke McDermott
8 months
Happy to announce that my paper “Linear Mode Connectivity in Sparse Neural Networks” was accepted at NeurIPS 2023’s UniReps workshop. link: 1/n
Tweet media one
1
1
6
@lukemcdermotttt
Luke McDermott
2 months
And for a bit of a biased reason…
@lukemcdermotttt
Luke McDermott
2 months
Wooooh! Happy to officially say I’ll be staying in San Diego for a PhD in ML & Data Science @ UCSD ECE
Tweet media one
4
0
47
0
1
6
@lukemcdermotttt
Luke McDermott
6 months
My hot take at #NeurIPS2023 , Beignets are overrated…
1
0
5
@lukemcdermotttt
Luke McDermott
11 months
@EvMill Why not just zero/mask off the output if a head is deemed “unconfident”? This would encourage expert attention heads that only activation when needed, similar to how ReLU creates modularity in MLP’s. Softmax would still be needed for confident heads in this case though.
0
0
5
@lukemcdermotttt
Luke McDermott
1 month
@joshmobleymusic @elevenlabsio Just wait for video games with auto-generated music depending on your environment & actions
1
0
4
@lukemcdermotttt
Luke McDermott
9 months
Great time at FastML for Science & Imperial College London. Presented work on NAS / Pruning for anomaly detectors on the L1 trigger at CERN.
Tweet media one
Tweet media two
Tweet media three
0
0
3
@lukemcdermotttt
Luke McDermott
2 months
app idea: overleaf for the phone someone pls make it happen
1
0
4
@lukemcdermotttt
Luke McDermott
9 months
Final day at #Automl2023 @automl_conf , presenting work on Neural Network Pruning & Dataset Disillation
Tweet media one
0
0
3
@lukemcdermotttt
Luke McDermott
8 months
@roydanroy "~ N(0,1)" enjoyers meet "(Omega, F, P)" purists
0
0
3
@lukemcdermotttt
Luke McDermott
7 months
@finbarrtimbers Always wondered this. 14B param model still has lots of activations. In pruning literature, removing individual weights is significantly better than removing neurons because a large representation matters more than the transformation from the weights.
1
0
3
@lukemcdermotttt
Luke McDermott
7 months
@kohjingyu What residencies today do you recommend before committing to a PhD program?
1
0
2
@lukemcdermotttt
Luke McDermott
15 days
@0xcd16 @vsbuffalo entirely research focused + 10 min walk to the beach + pretty big school + lack of rowdy students All makes for a great research school. Only lacks name recognition / selectivity that the other schools have
0
0
2
@lukemcdermotttt
Luke McDermott
11 months
@elan_learns @EvMill Yes, attention heads need modularity/become an expert in different features. Maybe there needs to be some Relu-style component paired with softmax to disable non-confident activations. Dynamic sparsity is essential.
0
0
2
@lukemcdermotttt
Luke McDermott
8 months
Thanks @HaoliYin , @EmilyLiJiayao , and Eva !
@gm8xx8
gm8xx8
8 months
Gradual Fusion Transformer (GraFT) advances ReID in computer vision with fusion tokens. Captures features efficiently and surpasses benchmarks. Optimized for size-performance balance using neural pruning.
0
0
1
0
0
2
@lukemcdermotttt
Luke McDermott
6 months
First authored “Linear Mode Connectivity in Sparse Neural Networks” Arxiv link: Co-authored “UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification” Arxiv link:
1
0
2
@lukemcdermotttt
Luke McDermott
3 months
@HaoliYin @EmilyLiJiayao you guys have no clue how many models I had to delete at modern haha
0
0
2
@lukemcdermotttt
Luke McDermott
7 months
Just waiting for someone to open source a foundation-model-sized supernet for LLMs. Massive pretraining costs, yet academics can cheaply sample the search space for their use cases.
@ylecun
Yann LeCun
8 months
Which is why open research communities wins.
13
30
308
0
0
2
@lukemcdermotttt
Luke McDermott
8 months
@kellerjordan0 Gradient sparsity at each layer: 0.07 0.30 0.03 0.01 0.00 0.01 0.06 0.00 0.19 0.32 0.30 0.71 0.03 0.82 0.97 0.995 0.9995 0.94 0.998 0.999992 0.02
1
0
2
@lukemcdermotttt
Luke McDermott
4 months
@BlackHC @arankomatsuzaki If I had a dollar for every time an ML paper is posted and the comments say “Isn’t this just x”…
1
0
1
@lukemcdermotttt
Luke McDermott
8 months
@GCazenavette @GzyAftermath @VictorKaiWang1 Awesome work, what are the main limitations with cross-architectural generalization to ViTs?
0
0
1
@lukemcdermotttt
Luke McDermott
7 months
@finbarrtimbers @cosminnegruseri Look for N:M, mixed, or semi-structured sparsity. Those are all names that I’ve seen for types of Sparsity around this. A100s can do 2:4 sparsity well, but most GPUs don’t so people tend to stick to vanilla structured pruning instead
0
0
1
@lukemcdermotttt
Luke McDermott
8 months
@kellerjordan0 If you keep track of the gradients during training by keeping a running sum of magnitudes, then with post-training info you can train that same initialization with much fewer gradients (freeze the ones that had the lowest movements)
1
0
1
@lukemcdermotttt
Luke McDermott
7 months
@finbarrtimbers In this setting it’s 50% weight Sparsity vs quantizing in half. I’m willing to bet quantizing hurts the model less.
0
0
1
@lukemcdermotttt
Luke McDermott
6 months
Special thanks to Jenni Crawford and @HaoliYin for leading UniCat and to Daniel Cummings for advising both papers!
0
0
1
@lukemcdermotttt
Luke McDermott
6 months
0
0
1
@lukemcdermotttt
Luke McDermott
8 months
@kellerjordan0 In practice obviously you can’t do this, but you see a generally trend of what gradients are important (early layers & the FC layer)
0
0
1
@lukemcdermotttt
Luke McDermott
8 months
To find these subnetworks, we leverage distilled data during the retraining stage of IMP to take advantage of the compressed representations. 5/n thanks!
0
0
1
@lukemcdermotttt
Luke McDermott
1 month
@Miles_Brundage Or a beer for the workplace that looks like nitro cold brew
0
0
1
@lukemcdermotttt
Luke McDermott
1 month
@hadesinwinter “I am sorry to inform that after significant consideration, I have to reject this rejection. Best of luck with your other applicants, see you in fall”
1
0
1
@lukemcdermotttt
Luke McDermott
8 months
@kellerjordan0 The final blocks (besides FC) have the lowest gradient movements where as the middle ones can be still important. I can send checkpoints later today. For now, I have a high performing model with 90% of the weights frozen and here’s the distribution of frozen weights 👇
1
0
1