Avrajit Ghosh @GhoshAvrajit X Profile

Avrajit Ghosh

@GhoshAvrajit

Followers

307

Following

10K

Media

25

Statuses

773

Postdoc @SimonsInstitute @berkeley_ai. Generalization, optimization, Inverse problems. PhD @MSU_EGR. (No prior) better than (wrong priors).

https://t.co/QiAFd8ygy7

Berkeley, CA

Joined February 2020

Don't wanna be here? Send us removal request.

Aryan Mokhtari

@AryanMokhtari

6 days

Second-order methods and preconditioner-based methods are **NOT** the same. Please stop using them interchangeably!

6

11

129

Stat.ML Papers

@StatMLPapers

21 days

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

arxiv.org

We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs -- a problem well motivated by...

0

2

14

François Chollet

@fchollet

14 days

If a problem seems intractable, it's almost always because your specification of it is vague or incomplete. The solution doesn't appear when you "think harder". It appears when you describe the problem in a sufficiently precise and explicit fashion -- until you see its true

59

157

1K

François Chollet

@fchollet

15 days

You rarely solve hard problems in a flash of insight. It's more typically a slow, careful process of exploring a branching tree of possibilities. You must pause, backtrack, and weigh every alternative. You can't fully do this in your head, because your working memory is too

93

306

3K

Shraman Pramanick

@Shramanpramani2

19 days

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on

Jiaxun Cui 🐿️

@cuijiaxun

21 days

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

27

343

Pedro Domingos

@pmddomingos

22 days

Try logistic regression before you try an LLM.

16

11

147

Molei Tao

@MoleiTaoMath

30 days

I'm hiring 2 PhD students & 1 postdoc @GeorgiaTech for Fall'26 Motivated students plz consider us, especially those in * ML+Quantum * DeepLearning+Optimization -PhD: see https://t.co/h4anjm6b8j -Postdoc: see https://t.co/548XVaahx3 & https://t.co/4ahNE7OOwV Retweet appreciated

9

120

466

Ben Recht

@beenwrekt

1 month

Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.

argmin.net

You don't need a theorem to argue more data is better than less data

7

24

192

Ben Recht

@beenwrekt

1 month

In machine learning, do you need to know any optimization algorithm other than stochastic gradient descent? A reluctant but best-faith argument for no.

argmin.net

Justifying a laser focus on stochastic gradient methods.

2

5

26

Jeremy Cohen

@deepcohen

1 month

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.

19

213

1K

Jeremy Cohen

@deepcohen

1 month

@jasondeanlee @SebastienBubeck @tomgoldsteincs @zicokolter @atalwalkar This is the third, last, and best paper from my PhD. By some metrics, an ML PhD student who writes just three conference papers is "unproductive." But I wouldn't have had it any other way 😉 !

11

21

536

Yuandong Tian

@tydsh

1 month

🚨New work: Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking ( https://t.co/U7e0d3duYq) In this work we propose a mathematical framework, named Li2, that explains the dynamics of grokking (i.e., delayed generalization) in 2-layer nonlinear networks.

arxiv.org

While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of...

8

37

227

Molei Tao

@MoleiTaoMath

1 month

Proud of my junior collaborators Kijung Jeon Yuchen @YuchenZhu_ZYC Wei @WeiGuo01 Jaemoo @jaemoo51133 Avrajit @GhoshAvrajit Lianghe Shi Yinuo @Yinuo_Ren Haoxuan @haoxuan_steve_c - 6 joint #NeurIPS2025 main track paper! Lucky to have you Wanna join us? Will post recruit info soon.

1

7

76

Jingfeng Wu

@uuujingfeng

2 months

sharing a new paper w Peter Bartlett, @jasondeanlee, @ShamKakade6, Bin Yu ppl talking about implicit regularization, but how good is it? We show its surprisingly effective, that GD dominates ridge for all linear regression, w/ more cool stuff on GD vs SGD https://t.co/oAVKiVgUUQ

10

32

187

Stat.ML Papers

@StatMLPapers

2 months

Information Geometry of Variational Bayes

arxiv.org

We highlight a fundamental connection between information geometry and variational Bayes (VB) and discuss its consequences for machine learning. Under certain conditions, a VB solution always...

1

27

162

François Chollet

@fchollet

2 months

The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.

100

569

4K

Stat.ML Papers

@StatMLPapers

2 months

Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis

arxiv.org

We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we...

0

1

4

Quanquan Gu

@QuanquanGu

2 months

Another fantastic benchmark of optimizers. Key takeaways: 1. Variance-reduced Adam variants (e.g., MARS) achieve significant speedups over the AdamW baseline. 2. Matrix-based optimizers (e.g., Muon, SOAP) consistently outperform their scalar-based counterparts (e.g., Lion).

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

2 months

Fantastic Pretraining Optimizers and Where to Find Them "we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B-1.2B parameters) and data-to-model ratios (1–8× the Chinchilla optimum)." "we find that all the fastest optimizers such as Muon