Gabriele Berton @gabriberton X Profile

Gabriele Berton

@gabriberton

Followers

7K

Following

6K

Media

380

Statuses

2K

Postdoc @Amazon working on VLM - ex @CarnegieMellon @PoliTOnews @IITalk

https://t.co/e0GZRIv8vV

Joined December 2021

Don't wanna be here? Send us removal request.

Gabriele Berton

@gabriberton

1 year

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

46

341

3K

Gabriele Trivigno

@gabTrivv

11 hours

🔥 Our paper SANSA is a #NeurIPS2025 Spotlight! We turn #SAM2 into a semantic few-shot segmenter for objects and parts, fully promptable (mask · point · box · scribble); only 10M trainable parameters and 5× faster than competitors. Code, models & demo https://t.co/bdfUd1YnlG 👇

1

10

12

Gabriele Berton

@gabriberton

1 day

@AnthropicAI is so efficient! In just a few hours they fixed the bug ;) They released Opus 4.5 (just a few hours after my post) which answers correctly, while Sonnet 4.5 does not

Gabriele Berton

@gabriberton

1 day

Claude doesn't know much about computational graphs, in fact it suggests to do the wrong thing entirely @AnthropicAI please add the tweet below in Claude's training data ;)

0

1

3

Gabriele Berton

@gabriberton

1 day

@AnthropicAI

Gabriele Berton

@gabriberton

1 day

GPT5.1 and Gemini3 give the right answer, Claude doesn't Screenshots from GPT, Gemini, Claude in this order

0

1

Gabriele Berton

@gabriberton

1 day

GPT5.1 and Gemini3 give the right answer, Claude doesn't Screenshots from GPT, Gemini, Claude in this order

Gabriele Berton

@gabriberton

1 day

Claude doesn't know much about computational graphs, in fact it suggests to do the wrong thing entirely @AnthropicAI please add the tweet below in Claude's training data ;)

0

6

Gabriele Berton

@gabriberton

1 day

Claude doesn't know much about computational graphs, in fact it suggests to do the wrong thing entirely @AnthropicAI please add the tweet below in Claude's training data ;)

Gabriele Berton

@gabriberton

1 year

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

1

4

24

Gabriele Berton

@gabriberton

1 day

Can we guess that Soumith and Yann are leaving Meta because they were only gettin millions while "new joiners" are getting orders of magnitude more?

0

10

Gabriele Berton

@gabriberton

3 days

Happy to see image matching people working on astronaut photography! Great work from RoMa v2 @Parskatt

1

0

9

Gabriele Berton

@gabriberton

3 days

https://t.co/G9fTP5nrre

Yuchen Jin

@Yuchenj_UW

5 days

Funny how OpenAI might have saved Google. At a party, an OpenAI guy named Dan challenged Sergey: “What are you doing? This is the greatest transformative moment in computer science,” and Sergey went right back into founder mode. Googlers probably love Dan. Sam probably not lol.

0

1

44

Gabriele Berton

@gabriberton

3 days

Went to buy a pair of shoes in Menlo Park and they first did a 3D reconstruction of my feet It felt surreal Of course I had to ask the shop assistant if he thought E2E methods would replace COLMAP one day

15

3

143

Gabriele Berton

@gabriberton

5 days

But only one RomaV2 @Parskatt

0

3

Gabriele Berton

@gabriberton

5 days

Anyone working on deep learning should know this by heart Especially 5/6

Matthias Niessner

@MattNiessner

4 years

(1/n) How to start a deep learning project? We use a remarkably streamlined step-by-step process to set up deep learning projects. At the same time, people who are new to deep learning tend to always make the same (avoidable) mistakes. Check out the thread below! 🧵

3

16

354

Gabriele Berton

@gabriberton

5 days

Too many ROMAs out there

2

0

15

Gabriele Berton

@gabriberton

8 days

Super interesting and hints to a possible direction to train more robust VLMs

Amir Rosenfeld

@AmirRosenfeld

8 days

VLMs (GPT-4o, Gemini, Qwen-VL, LLaVA…) look impressive — until you shift an image by 1 pixel. A tiny, meaning-preserving change → a completely different answer. This isn’t adversarial — it’s natural variation. Watch 👇

5

7

164

Gabriele Berton

@gabriberton

8 days

[1] EarthMatch:

0

6

Gabriele Berton

@gabriberton

8 days

What's the take-home message? It is very likely that what you need is already out there. You don't always need to come up with novelty and a paper. Thoroughly benchmark existing baselines first, you'll find many answers there [4/4]

1

0

19

Gabriele Berton

@gabriberton

8 days

We found that SIFT was the best for our use case with a thorough benchmark of all image matching methods [1], where we also tried fairly unused methods. We found that (1) a now uncommon method was best and (2) most importantly, we didn't need to train a new model [3/4]

1

0

11

Gabriele Berton

@gabriberton

8 days

SIFT is rotation invariant by design, which is perfect for our use case We post-process SIFT features with LightGlue, which gives great results and no false positive Precision is 100%. This was one of the main requirements for the project [2/4]

1

0

14

Gabriele Berton

@gabriberton

8 days

Always fun to see people reaction when I tell them we're using SIFT features for AstroLoc That's right, a software deployed in 2025 at NASA uses SIFT, a method from 1999 Why SIFT? [1/4]

Gabriele Berton

@gabriberton

10 months

Excited to release the first worldwide aerial image localization method (and demo!) Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint! Links to paper, demo and full-length (5 min) video ⬇️

4

14

86

Gabriele Berton

@gabriberton

14 days

I should specify that this is a weird edge case, and that usually autocast helps (faster and lower memory). This probably happens because some ops in the cross entropy are computed in float32 for stability

0

2