Roger Waleffe Profile
Roger Waleffe

@RWaleffe

Followers
77
Following
23
Media
2
Statuses
26

Computer Sciences PhD student at the University of Wisconsin-Madison

Joined June 2020
Don't wanna be here? Send us removal request.
@RWaleffe
Roger Waleffe
5 months
RT @ctnzr: Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy.* Traine….
0
102
0
@RWaleffe
Roger Waleffe
1 year
RT @ctnzr: A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset:.* 7% attention, the res….
0
77
0
@RWaleffe
Roger Waleffe
1 year
RT @thodrek: Data pruning to reduce pertaining costs is hot, but fancy pruning can take just as long to select data as to train on all of i….
0
4
0
@RWaleffe
Roger Waleffe
2 years
RT @DisseminatePod: 🚨 "MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks" with @RWaleffe is available now! . 🎧….
Tweet card summary image
buymeacoffee.com
Hey I'm Jack from Disseminate: The Computer Science Research Podcast!If you enjoy the show consider buying me a coffee so I can keep making the show. Thanks!
0
2
0
@RWaleffe
Roger Waleffe
2 years
Joint work with Patrik Okanovic @vmageirakos Kostis Nikolakakis @aminkarbasi @DKalogerias @nmervegurel @thodrek.
1
0
5
@RWaleffe
Roger Waleffe
2 years
See the preprint here: for extensive evaluations together with the convergence analysis and discussion on its generalization.
2
0
4
@RWaleffe
Roger Waleffe
2 years
Performance? *The* reduction in time-to-accuracy! Take ImageNet as an example and use less than 10% of data each epoch: accuracy will improve up to 29% over competing pruning methods while offering a runtime reduction of 7x!.
1
1
5
@RWaleffe
Roger Waleffe
2 years
Not convinced about using random sampling for data pruning? Consider twice! In our recent work, we introduce Repeated Sampling of Random Subsets: where we sample a subset of data at each epoch of training instead of only once at the beginning!.
3
9
40
@RWaleffe
Roger Waleffe
3 years
RT @ImmanuelTrummer: Roger Waleffe (@RWaleffe) from @wiscdb introduces the Marius++ system! .Check out the talk: @W….
0
2
0
@RWaleffe
Roger Waleffe
3 years
RT @ImmanuelTrummer: Roger Waleffe (@RWaleffe) shows how to train over billion-scale graphs on a single machine! .Join us at 1 PM ET via Zo….
0
2
0
@RWaleffe
Roger Waleffe
3 years
RT @thodrek: Scalability is a key factor limiting the use of Graph Neural Networks (GNNs) over large graphs; w/ @RWaleffe, @JasonMohoney ,….
0
5
0
@RWaleffe
Roger Waleffe
4 years
RT @thodrek: Accepted to #OSDI21: @JasonMohoney & @RWaleffe show how to train massive graph embeddings in a 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝗰𝗵𝗶𝗻𝗲; don't burn $$$$….
0
9
0
@RWaleffe
Roger Waleffe
5 years
RT @StatMLPapers: Principal Component Networks: Parameter Reduction Early in Training. (arXiv:2006.13347v1 [cs.LG])
0
2
0
@RWaleffe
Roger Waleffe
5 years
RT @thodrek: 3/3 We term these networks Principal Component Networks (PCNs). Practical results: We show that converting wide networks to….
0
1
0
@RWaleffe
Roger Waleffe
5 years
RT @thodrek: 2/3 The secret sauce: Hidden layer activations in wide networks live in small subspaces! Train your wide-net for a few epochs,….
0
1
0
@RWaleffe
Roger Waleffe
5 years
RT @thodrek: 1/3 Super exciting new result by Roger (@RWaleffe) on how to find small networks that exhibit the same performance as overpara….
0
5
0