Gregor Bachmann @GregorBachmann1 X Profile

Gregor Bachmann

@GregorBachmann1

Followers

375

Following

498

Media

7

Statuses

119

I am a PhD student @ETH Zürich working on deep learning. MLP-pilled 💊. https://t.co/yWdDEV6Z15

Joined May 2022

Don't wanna be here? Send us removal request.

Gregor Bachmann

@GregorBachmann1

2 years

Very thrilled to announce that our work "Scaling MLPs" has been accepted at NeurIPS 🥳 Check out our new Arxiv version https://t.co/AhbYBhZfqH ! @SAnagnostidis and I managed to push performance even further 🔥

arxiv.org

In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into...

AK

@_akhaliq

2 years

Scaling MLPs: A Tale of Inductive Bias paper page: https://t.co/1M3ERzInpZ In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are

0

6

57

Tiago Pimentel

@tpimentelms

4 months

Honoured to receive two (!!) Senior Area Chair awards at #ACL2025 😁 (Conveniently placed on the same slide!) With the amazing Philip Whittington, @GregorBachmann1 and @weGotlieb, @CuiDing_CL, Giovanni Acampa, @a_stadt, @tamaregev

ACL 2025

@aclmeeting

4 months

https://t.co/Ls2rYUyUEo

1

5

69

Vaishnavh Nagarajan

@_vaishnavh

4 months

Today @ChenHenryWu and I will be presenting our #ICML work on creativity in the Oral 3A Reasoning session (West Exhibition Hall C) 10 - 11 am PT Or please stop by our poster right after @ East Exhibition Hall A-B #E-2505 11am-1:30pm. (Hope you enjoy some silly human drawings!)

1

21

89

Ayça Takmaz

@aycatakmaz

4 months

Can we learn to complete anything in Lidar without any manual supervision? Excited to share our #ICML2025 paper “Towards Learning to Complete Anything in Lidar” from my time at @nvidia with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @AljosaOsep @lealtaixe! Thread🧵👇

1

12

58

Edward Milsom

@edward_milsom

5 months

What's some "must read" literature on generalisation in neural networks? I keep thinking about this paper and it really makes me want to understand better the link between optimisation and generalisation. https://t.co/6UMJqhMVCO

arxiv.org

In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we consider...

5

30

227

Ayça Takmaz

@aycatakmaz

5 months

Our workshop on open-world 3D scene understanding OpenSUN3D is taking place this afternoon at @CVPR!

Elisabetta Fedele

@efedele16

5 months

Join us at OpenSUN3D☀️ workshop this afternoon @CVPR 🚀 📍: Room 105 A 🕰️: 2:00-6:00 pm 🌍: https://t.co/4nSXaJGNpR @afshin_dn @leto__jean @lealtaixe

0

4

26

Vaishnavh Nagarajan

@_vaishnavh

6 months

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

1

45

170

Spyros Gidaris

@SpyrosGidaris

6 months

Better LLM training? @GregorBachmann1 & @_vaishnavh showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks @FabianGloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons https://t.co/wfC88hMdPd

Anastasios Gerontopoulos

@NasosGer

6 months

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

0

6

10

Vaishnavh Nagarajan

@_vaishnavh

6 months

@francoisfleuret Hey @francoisfleuret, we had formalized this very intuition here in this late-2023 work you may be interested in :-)

2

1

11

Ayça Takmaz

@aycatakmaz

7 months

Thanks @_akhaliq for sharing! During my internship at @NVIDIAAI, we explored zero-shot panoptic completion of Lidar scans — together with @CristianoSalto @NeeharPeri @meinhardt_tim @RdeLutio @lealtaixe @AljosaOsep!

AK

@_akhaliq

7 months

Nvidia just announced Towards Learning to Complete Anything in Lidar

2

12

72

AK

@_akhaliq

7 months

Nvidia just announced Towards Learning to Complete Anything in Lidar

9

55

412

Dimitri von Rütte

@dvruette

8 months

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

21

160

1K

Ayça Takmaz

@aycatakmaz

10 months

I will be giving a talk on open-vocabulary 3D scene understanding at the next ZurichCV meetup! 🗓️ Date: Thursday, January 23rd 18:00 📍Location: @ETH_AI_Center, please see https://t.co/fCYT2YkTAq for additional details!

zurichai.ch

Ayca Takmaz on open-vocabulary 3D scene understanding.

0

9

45

Ayça Takmaz

@aycatakmaz

11 months

Join us for the 4th edition of ☀️OpenSUN3D🌎 workshop on open-world 3D scene understanding at #CVPR2025! We will explore emerging trends in 3D scene understanding, and applications of language models in 3D vision. We're also hosting a challenge! 📚

Francis Engelmann

@FrancisEngelman

11 months

Get ready for the next @CVPR workshop on OpenWorld 3D Scene Understanding ➡️ https://t.co/XqA2dyAp2Q We will be hosting: - prized challenge 🏆 (see https://t.co/URaRTqmkx5) - paper track 🗞️ - exciting keynote speakers 👩‍🏫 #CVPR2025

0

2

14

Tiago Pimentel

@tpimentelms

11 months

BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a very difficult—in fact, NP-complete—problem!🤯 New paper + P. Whittington, @GregorBachmann1 :)

arxiv.org

In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly...

6

80

431

Enis Simsar

@enisimsar

11 months

🚀 Excited to share our preprint LoRACLR! TL;DR: LoRACLR merges multiple LoRA models into a unified diffusion model for seamless, high-fidelity multi-concept image synthesis with minimal interference. Thanks to @THofmann2017, @fedassa, and @PINguAR! 🙌

3

5

28

Bobby

@bobby_he

11 months

Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and how we can prevent them!

Bobby

@bobby_he

1 year

Updated camera ready https://t.co/dnMOQryvgJ. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

0

10

45

Ayça Takmaz

@aycatakmaz

1 year

We have an exciting line-up of keynote speakers at our workshop for open-vocabulary 3D scene understanding, OpenSUN3D☀️ at #ECCV2024! 🗓️Sept 29, Sunday 14:00-17:30 ✍️ https://t.co/lbzmGFxyrN @meinhardt_tim @orlitany @AlexBewleyAI @_krishna_murthy

Francis Engelmann

@FrancisEngelman

1 year

Introducing our Keynote Speakers at this edition of the OpenSUN3D workshop #ECCV2024 (Sept 29, Sunday 14:00-15:30, Room: Amber 4) in Milano🇮🇹 Full schedule: https://t.co/XqA2dyAp2Q 🚀 @eccvconf @ETH_en @ETH_AI_Center @Stanford

0

4

29

andrea panizza

@unsorsodicorda

1 year

@irfnali1 @srush_nlp @cshalizi This is really nice! But the proof is very general and thus complicated. A simpler proof, together with a proof of what can go wrong when learning these next-token predictors with MLE, is given in this (IMHO underrated) paper https://t.co/4UiN0QTjsJ @GregorBachmann1 @_vaishnavh

0

3

9

Alex Hägele

@haeggee

1 year

come to the poster session at 12pm and our spotlight presentation at 3pm, both in Straus 3!

Alex Hägele

@haeggee

1 year

I'm also at ICML -- excited to present our paper on training + LR schedules as a spotlight (!) at the workshop on the next gen of seq. models as well as ES-FOMO on Fri🤙 Reach out to discuss methods for training open models, scaling, efficiency, or the future of architectures :)

0

2

19

Dimitri von Rütte

@dvruette

1 year

We’re presenting our work on concept guidance today at 13:30’s ICML poster session (# 706). Come by and say hi! #ICML #ICML2024

Dimitri von Rütte

@dvruette

2 years

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 https://t.co/fSdxxMjIUe

0

3

12