GhoshAvrajit Profile Banner
Avrajit Ghosh Profile
Avrajit Ghosh

@GhoshAvrajit

Followers
307
Following
10K
Media
25
Statuses
773

Postdoc @SimonsInstitute @berkeley_ai. Generalization, optimization, Inverse problems. PhD @MSU_EGR. (No prior) better than (wrong priors).

Berkeley, CA
Joined February 2020
Don't wanna be here? Send us removal request.
@AryanMokhtari
Aryan Mokhtari
6 days
Second-order methods and preconditioner-based methods are **NOT** the same. Please stop using them interchangeably!
6
11
129
@fchollet
François Chollet
14 days
If a problem seems intractable, it's almost always because your specification of it is vague or incomplete. The solution doesn't appear when you "think harder". It appears when you describe the problem in a sufficiently precise and explicit fashion -- until you see its true
59
157
1K
@fchollet
François Chollet
15 days
You rarely solve hard problems in a flash of insight. It's more typically a slow, careful process of exploring a branching tree of possibilities. You must pause, backtrack, and weigh every alternative. You can't fully do this in your head, because your working memory is too
93
306
3K
@Shramanpramani2
Shraman Pramanick
19 days
My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on
@cuijiaxun
Jiaxun Cui 🐿️
21 days
Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)
27
27
343
@pmddomingos
Pedro Domingos
22 days
Try logistic regression before you try an LLM.
16
11
147
@MoleiTaoMath
Molei Tao
30 days
I'm hiring 2 PhD students & 1 postdoc @GeorgiaTech for Fall'26 Motivated students plz consider us, especially those in * ML+Quantum * DeepLearning+Optimization -PhD: see https://t.co/h4anjm6b8j -Postdoc: see https://t.co/548XVaahx3 & https://t.co/4ahNE7OOwV Retweet appreciated
9
120
466
@beenwrekt
Ben Recht
1 month
Almost a decade ago, I coauthored a paper asking us to rethink our theory of generalization in machine learning. Today, I’m fine putting the theory back on the shelf.
Tweet card summary image
argmin.net
You don't need a theorem to argue more data is better than less data
7
24
192
@beenwrekt
Ben Recht
1 month
In machine learning, do you need to know any optimization algorithm other than stochastic gradient descent? A reluctant but best-faith argument for no.
Tweet card summary image
argmin.net
Justifying a laser focus on stochastic gradient methods.
2
5
26
@deepcohen
Jeremy Cohen
1 month
Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
19
213
1K
@deepcohen
Jeremy Cohen
1 month
@jasondeanlee @SebastienBubeck @tomgoldsteincs @zicokolter @atalwalkar This is the third, last, and best paper from my PhD. By some metrics, an ML PhD student who writes just three conference papers is "unproductive." But I wouldn't have had it any other way 😉 !
11
21
536
@tydsh
Yuandong Tian
1 month
🚨New work: Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking ( https://t.co/U7e0d3duYq) In this work we propose a mathematical framework, named Li2, that explains the dynamics of grokking (i.e., delayed generalization) in 2-layer nonlinear networks.
Tweet card summary image
arxiv.org
While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of...
8
37
227
@MoleiTaoMath
Molei Tao
1 month
Proud of my junior collaborators Kijung Jeon Yuchen @YuchenZhu_ZYC Wei @WeiGuo01 Jaemoo @jaemoo51133 Avrajit @GhoshAvrajit Lianghe Shi Yinuo @Yinuo_Ren Haoxuan @haoxuan_steve_c - 6 joint #NeurIPS2025 main track paper! Lucky to have you Wanna join us? Will post recruit info soon.
1
7
76
@uuujingfeng
Jingfeng Wu
2 months
sharing a new paper w Peter Bartlett, @jasondeanlee, @ShamKakade6, Bin Yu ppl talking about implicit regularization, but how good is it? We show its surprisingly effective, that GD dominates ridge for all linear regression, w/ more cool stuff on GD vs SGD https://t.co/oAVKiVgUUQ
10
32
187
@fchollet
François Chollet
2 months
The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.
100
569
4K
@QuanquanGu
Quanquan Gu
2 months
Another fantastic benchmark of optimizers. Key takeaways: 1. Variance-reduced Adam variants (e.g., MARS) achieve significant speedups over the AdamW baseline. 2. Matrix-based optimizers (e.g., Muon, SOAP) consistently outperform their scalar-based counterparts (e.g., Lion).
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
2 months
Fantastic Pretraining Optimizers and Where to Find Them "we conduct a systematic study of ten deep learning optimizers across four model scales (0.1B-1.2B parameters) and data-to-model ratios (1–8× the Chinchilla optimum)." "we find that all the fastest optimizers such as Muon
5
22
187