GregorBachmann1 Profile Banner
Gregor Bachmann Profile
Gregor Bachmann

@GregorBachmann1

Followers
375
Following
498
Media
7
Statuses
119

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. https://t.co/yWdDEV6Z15

Joined May 2022
Don't wanna be here? Send us removal request.
@GregorBachmann1
Gregor Bachmann
2 years
Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version https://t.co/AhbYBhZfqH ! @SAnagnostidis and I managed to push performance even further 🔥
Tweet card summary image
arxiv.org
In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into...
@_akhaliq
AK
2 years
Scaling MLPs: A Tale of Inductive Bias paper page: https://t.co/1M3ERzInpZ In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are
0
6
57
@tpimentelms
Tiago Pimentel
4 months
Honoured to receive two (!!) Senior Area Chair awards at #ACL2025 😁 (Conveniently placed on the same slide!) With the amazing Philip Whittington, @GregorBachmann1 and @weGotlieb, @CuiDing_CL, Giovanni Acampa, @a_stadt, @tamaregev
1
5
69
@_vaishnavh
Vaishnavh Nagarajan
4 months
Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)
1
21
89
@aycatakmaz
Ayça Takmaz
4 months
Can we learn to complete anything in Lidar without any manual supervision? Excited to share our #ICML2025 paper “Towards Learning to Complete Anything in Lidar” from my time at @nvidia with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @AljosaOsep @lealtaixe! Thread🧵👇
1
12
58
@edward_milsom
Edward Milsom
5 months
What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. https://t.co/6UMJqhMVCO
Tweet card summary image
arxiv.org
In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we consider...
5
30
227
@aycatakmaz
Ayça Takmaz
5 months
Our workshop on open-world 3D scene understanding OpenSUN3D is taking place this afternoon at @CVPR!
@efedele16
Elisabetta Fedele
5 months
Join us at OpenSUN3D☀️ workshop this afternoon @CVPR 🚀 📍: Room 105 A 🕰️: 2:00-6:00 pm 🌍: https://t.co/4nSXaJGNpR @afshin_dn @leto__jean @lealtaixe
0
4
26
@_vaishnavh
Vaishnavh Nagarajan
6 months
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
1
45
170
@SpyrosGidaris
Spyros Gidaris
6 months
Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons https://t.co/wfC88hMdPd
@NasosGer
Anastasios Gerontopoulos
6 months
1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful
0
6
10
@_vaishnavh
Vaishnavh Nagarajan
6 months
@francoisfleuret Hey @francoisfleuret, we had formalized this very intuition here in this late-2023 work you may be interested in :-)
2
1
11
@aycatakmaz
Ayça Takmaz
7 months
Thanks @_akhaliq for sharing! During my internship at @NVIDIAAI, we explored zero-shot panoptic completion of Lidar scans — together with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @lealtaixe @AljosaOsep!
@_akhaliq
AK
7 months
Nvidia just announced Towards Learning to Complete Anything in Lidar
2
12
72
@_akhaliq
AK
7 months
Nvidia just announced Towards Learning to Complete Anything in Lidar
9
55
412
@dvruette
Dimitri von Rütte
8 months
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
21
160
1K
@aycatakmaz
Ayça Takmaz
10 months
I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see https://t.co/fCYT2YkTAq for additional details!
Tweet card summary image
zurichai.ch
Ayca Takmaz on open-vocabulary 3D scene understanding.
0
9
45
@aycatakmaz
Ayça Takmaz
11 months
Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚
@FrancisEngelman
Francis Engelmann
11 months
Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ https://t.co/XqA2dyAp2Q We will be hosting: - prized challenge 🏆 (see https://t.co/URaRTqmkx5) - paper track 🗞️ - exciting keynote speakers  👩‍🏫 #CVPR2025
0
2
14
@tpimentelms
Tiago Pimentel
11 months
BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :)
Tweet card summary image
arxiv.org
In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly...
6
80
431
@enisimsar
Enis Simsar
11 months
🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌
3
5
28
@bobby_he
Bobby
11 months
Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!
@bobby_he
Bobby
1 year
Updated camera ready https://t.co/dnMOQryvgJ. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!
0
10
45
@aycatakmaz
Ayça Takmaz
1 year
We have an exciting line-up of keynote speakers at our workshop for open-vocabulary 3D scene understanding, OpenSUN3D☀️ at #ECCV2024! 🗓️Sept 29, Sunday 14:00-17:30 ✍️ https://t.co/lbzmGFxyrN @meinhardt_tim @orlitany @AlexBewleyAI @_krishna_murthy
@FrancisEngelman
Francis Engelmann
1 year
Introducing our Keynote Speakers at this edition of the OpenSUN3D workshop #ECCV2024 (Sept 29, Sunday 14:00-15:30, Room: Amber 4) in Milano🇮🇹 Full schedule: https://t.co/XqA2dyAp2Q 🚀 @eccvconf @ETH_en @ETH_AI_Center @Stanford
0
4
29
@unsorsodicorda
andrea panizza
1 year
@irfnali1 @srush_nlp @cshalizi This is really nice! But the proof is very general and thus complicated. A simpler proof, together with a proof of what can go wrong when learning these next-token predictors with MLE, is given in this (IMHO underrated) paper https://t.co/4UiN0QTjsJ @GregorBachmann1 @_vaishnavh
0
3
9
@haeggee
Alex Hägele
1 year
come to the poster session at 12pm and our spotlight presentation at 3pm, both in Straus 3!
@haeggee
Alex Hägele
1 year
I'm also at ICML -- excited to present our paper on training + LR schedules as a spotlight (!) at the workshop on the next gen of seq. models as well as ES-FOMO on Fri🤙 Reach out to discuss methods for training open models, scaling, efficiency, or the future of architectures :)
0
2
19
@dvruette
Dimitri von Rütte
1 year
We’re presenting our work on concept guidance today at 13:30’s ICML poster session (# 706). Come by and say hi! #ICML #ICML2024
@dvruette
Dimitri von Rütte
2 years
🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 https://t.co/fSdxxMjIUe
0
3
12