Randall Balestriero @randall_balestr X Profile

Randall Balestriero

@randall_balestr

Followers

4K

Following

299

Media

185

Statuses

571

AI Researcher: From theory to practice (and back) Postdoc @MetaAI with @ylecun PhD @RiceUniversity with @rbaraniuk Masters @ENS_Ulm @Paris_Sorbonne

USA

Joined April 2020

Don't wanna be here? Send us removal request.

Randall Balestriero

@randall_balestr

2 months

Impressed by DINOv2 perf. but don't want to spend too much $$$ on compute and wait for days to pretrain on your own data? Say no more! Data augmentation curriculum speeds up SSL pretraining (as it did for generative and supervised learning) -> FastDINOv2!.

4

32

190

Randall Balestriero

@randall_balestr

2 hours

In case a full day of splines isn't enough to make you book your trip to JMM26. Perhaps a 90min session with @ylecun (+ TBD speakers) on self supervised learning/world models WITH math will close the deal!.No matter your background, come in number to discuss research together!

Randall Balestriero

@randall_balestr

2 days

Interested in splines for AI theory (generalization, Grokking, explainability, generative modeling, . )? Wait no more! We are organizing a dedicated session at JMM26 (the largest math conf.): Consider submitting your abstract! Deadline in 2 weeks!

0

5

Grok

@grok

8 days

Join millions who have switched to Grok.

220

450

3K

Randall Balestriero

@randall_balestr

2 days

But this makes me wonder: if different MAE hparams make you learn different features about your input, can there exist a more universal reconstruction-based pretraining solution? Huge congrats to the MVPs @Abisulco @RahulRam3sh and Pratik from UPenn for making us wonder!.

0

6

Randall Balestriero

@randall_balestr

2 days

While we build strong insights from theoretical analysis of simplified MAEs, our findings transfer to nonlinear ViT MAEs opening new ways to select their hyper-parameters if you know a little bit about your task and pretraining data distribution!

1

0

6

Randall Balestriero

@randall_balestr

2 days

Learning by input-space reconstruction is often inefficient and hard to get right (compared to joint-embedding). While previous theory explains why in the linear/kernel setting, we now take a deep dive into MAEs specifically!.Now on arxiv + #ICCV2025 !

6

15

82

Randall Balestriero

@randall_balestr

2 days

Interested in splines for AI theory (generalization, Grokking, explainability, generative modeling, . )? Wait no more! We are organizing a dedicated session at JMM26 (the largest math conf.): Consider submitting your abstract! Deadline in 2 weeks!

0

1

17

Randall Balestriero

@randall_balestr

15 days

RT @NYUDataScience: Congratulations to CDS PhD Student @vlad_is_ai, Courant PhD Student Kevin Zhang, CDS Faculty Fellow @timrudner, CDS Pro….

0

7

0

Randall Balestriero

@randall_balestr

2 months

RT @rpatrik96: I am heading to @icmlconf to present our position paper with @randall_balestr @klindt_david @wielandbr on what we believe ar….

0

7

0

Randall Balestriero

@randall_balestr

2 months

Huge congrats to the amazing @JiaqiZhang82804 Juntuo Wang, Zhixin Sun and John Zou! If you are interested in follow ups for other modalities and/or CLIP, please DM me or shoot me an email!.

1

0

8

Randall Balestriero

@randall_balestr

2 months

Beyond computational benefits, we strongly believe that curriculum of data-augmentation is a **totally unexplored area in SSL research** that could lead to breakthrough in terms of robustness, transfer performances and reduced spurious correlations learning!

2

0

10

Randall Balestriero

@randall_balestr

2 months

Low to high resolution is the main driver of pretraining speedup. This type of strategy has been used forever in other settings (GANs, supervised learning, . ) and was also tried on SimCLR in But we find that beyond speedups, this improved robustness!

1

0

7

Randall Balestriero

@randall_balestr

2 months

RT @pszwnzl: Our paper "Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations" has been accepted to @ICCVConf….

0

23

0

Randall Balestriero

@randall_balestr

2 months

We also have theoretical justifications of CT, in particular, CT does a Sobolev space projection of your DN! This is only a first step towards provable methods improving SOTAs. Huge congrats to @leonleyanghu and Matteo Gamba!

1

0

2

Randall Balestriero

@randall_balestr

2 months

But smoothing your boundary also means better adversarial robustness! Which again holds out of the box with CT without having to do any training/finetuning of the original model weights

1

0

3

Randall Balestriero

@randall_balestr

2 months

Smoothing the model curvature means improving transfer learning performances (duh) but we actually compete with LoRA albeit having much less trainable parameters (even with LoRA rank1). This holds across pretty much all datasets/models we tried

1

0

1

Randall Balestriero

@randall_balestr

2 months

CT implementation is super simple, simply switch the activation function in your deep network with ours, and manually tune or learn the corresponding curvature parameter! This works e.g. on Resnets but also on ViTs! And the overhead is minimal (less than LoRA)

2

0

3

Randall Balestriero

@randall_balestr

2 months

CurvatureTuning2.0! Provably steer/finetune your model curvature without changing its parameters in a way that LoRA provably can't do!.- strong theory (Sobolev space projection).- strong experiments (dozens of models/datasets, manual/learnable steering).

4

9

56

Randall Balestriero

@randall_balestr

2 months

Huge congrats to MVP @thomas_M_walker @imtiazprio and @rbaraniuk ! Ping us with questions or comments!.

0

6

Randall Balestriero

@randall_balestr

2 months

Our solution is much faster than alternatives e.g. weight regularization, adversarial training, or even GrokFast (or rather GrokNotSoFast apparently), and it is computationally tractable requiring nearly no change in your existing training pipeline!

1

0

8

Randall Balestriero

@randall_balestr

2 months

Based on that theory, we found a regularizer to speed up the convergence of the Jacobian matrices resulting in faster Grokking and better training dynamics in general, that work across transformer and non-transformer architectures! Play with the code:

1

0

7