Explore tweets tagged as #overparameterized
@CalcCon
Calc Consulting
2 years
Many LLMs appear to be widely overparameterized, with no apparent gain in performance. Why is this ? Results on Falcon suggest why. It appears that many text datasets have too many duplicates, effectively lowering the size. This causes large alphas and lower quality.
Tweet media one
3
2
24
@quantseeker
QuantSeeker
2 months
A new paper by Nagel challenges the “virtue of complexity” in return prediction. Prior work claimed that overparameterized models trained on tiny samples deliver strong out-of-sample performance. Nagel shows they essentially boil down to volatility-timed momentum. The paper
Tweet media one
4
7
70
@StephanMandt
Stephan Mandt @ AISTATS’25
1 year
Great news from #UAI that one of our papers got selected for an oral presentation! TL;DR: we analyzed overfitting in overparameterized, heteroskedastic regression models and found a phase transition between two distinct types of overfitting! 🧵
Tweet media one
2
3
47
@ChristophMolnar
Christoph Molnar 🦋 christophmolnar.bsky.social
1 year
Statistician: You may never cross the interpolation threshold. Your models may never be overparameterized. Deep learner: Hold my beer!
Tweet media one
4
3
47
@shriver_light
anonym(論文1000本ノックの人)
2 years
#111論文等共有 1211 [NeurIPS'19 Oral] Overparameterized two-layer NN の教師生徒モデルの理論解析。 activation の違いで違う解に行く事を示した。まず入力次元大で SGD は微分方程式に従うことを証明し、生徒が教師より大きい時 ①1層目だけ訓練すると生徒大で汎化悪化
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
6
35
@yuxiangw_cs
Yu-Xiang Wang
1 year
It's time to reveal the answer to the poll! The correct answer is D. Surprised? I was too. With merely 30 data points in 1D, shouldn't an overparameterized neural network converge exponentially to an interpolating solution? A quick🧵with my explanation.
Tweet media one
2
4
34
@robot_trainer
Nathan Ratliff
1 year
This is a surprisingly under-discussed paper from a few years ago. Big DNNs are literally memory machines–more than just remembering the data, but with the ability to recall the data as well. If you simply construct an overparameterized autoencoder and train it as the identity
Tweet media one
0
0
11
@tha_ajanthan
Thalaiyasingam Ajanthan
3 months
1/3 New paper: We show that leveraging solution space structure via reduction mappings can accelerate convergence in gradient-based optimization. This has practical implications in settings such as low-rank projection, matrix factorisation, and overparameterized neural networks.
Tweet media one
1
0
7
@csteinmetz1
Christian Steinmetz
2 years
"Differentiable All-pass Filters for Phase Response Estimation and Automatic Signal Alignment". Improves the quality of parallel processing with learnable allpass filters optimized via an overparameterized network. abs: web:
Tweet media one
0
10
43
@shriver_light
anonym(論文1000本ノックの人)
2 years
#111論文等共有 1096 [ICML'23] Physics-informed NN (PINN) のlossは入力の微分があり難しい。overparameterized PINNの大域解収束を勾配流で証明。勾配法は適当な学習率では大域解収束すると証明。線形2次PDEを含む広い方程式、LeCunUniform等を含む広い初期化で証明。
Tweet media one
Tweet media two
Tweet media three
1
4
7
@CalcCon
Calc Consulting
5 months
A little more about Double Descent. Today most AI researchers recognize that in direct contrast to the predictions of Statistical Learning Theory (SLT), AI models can perform very well despite being widely overparameterized (N>P). That is, no one cares anymore about the
Tweet media one
0
0
6
@Poromechanics
Steve Sun
7 months
Our paper "Symbolic feature engineering of overparameterized Eulerian hyperelasticity models for fast inference time" has been accepted by CMAME. Paper: Data and code: 1/10
Tweet media one
1
1
11
@shriver_light
anonym(論文1000本ノックの人)
2 years
#111論文等共有 1079 [ICLR'23] Benign overfitting (BO)はoverparameterizedでは起こるがmildly-overparameterizedだとlabel noiseがあると起きない事を証明。その際はearly stoppingが有効。ResNet on CIFARではBOが起きImageNetでは起きない実験結果とconsistent。
Tweet media one
Tweet media two
1
2
16
@UMichECE
Electrical & Computer Engineering at Michigan
1 year
Congratulations to Dr. Mingchen Li on her PhD defense! 🎓 Her research advances the field of AI by guiding overparameterized models to robust generalization using regularization, bilevel optimization, and principled architecture design. #PhDDefense #MachineLearning #ECE #goblue
Tweet media one
0
0
2
@JmlrOrg
Journal of Machine Learning Research
1 month
'DRM Revisited: A Complete Error Analysis', by Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang. . #overparameterization #overparameterized #deep.
4
5
15
@CalcCon
Calc Consulting
3 months
𝐖𝐡𝐞𝐫𝐞 𝐝𝐨𝐞𝐬 𝐃𝐨𝐮𝐛𝐥𝐞 𝐃𝐞𝐬𝐜𝐞𝐧𝐭 𝐜𝐨𝐦𝐞 𝐟𝐫𝐨𝐦 ? 🤷‍♂️ It was actually discovered in the theoretical physics literature in 1989. Looking at a very simple model for a NN--running Linear Regression (LR) on a widely overparameterized binary classification problem.
Tweet media one
0
1
10
@CalcCon
Calc Consulting
4 months
Here's today's talk on Double Descent from the weightwatcher community discord. WeightWatcher and Double Descent (DD). Or, "Understanding OverParameterized Models." What can we learn about this mysterious, misunderstood phenomenon in from the perspective of theoretical physics,
0
0
6
@KwekuOA
Kweku Opoku-Agyemang, Ph.D
2 years
What are the most important statistical ideas of the past 50 years?.Counterfactual causal inference, overparameterized models and regularization, Bayesian multilevel models, adaptive decision analysis, and more—and how they relate to big data.
Tweet media one
0
3
5
@papers_daily
Daily AI Papers
2 years
The Interpolating Information Criterion for Overparameterized Models. The problem of model selection is considered for the setting of interpolating estimators. Classical information criteria. 🧵 👇
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
1
15
@anthrupad
w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝
2 years
models overparameterized to the universe's generating program itself keeping themselves occupied by looking at light going through dynamic 14 dimensional hyperbolic fractal glass
Tweet media one
4
1
60