puneeshdeora Profile Banner
Puneesh Deora Profile
Puneesh Deora

@puneeshdeora

Followers
109
Following
4K
Media
131
Statuses
678

PhD student at UBC. Working on the foundations of LLMs and theory of DL. Loves memes :)

Joined August 2019
Don't wanna be here? Send us removal request.
@puneeshdeora
Puneesh Deora
8 days
🚨 New paper drop! 🚨. 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick?. It chooses the simplest sufficient one! . 🧵👇
Tweet media one
5
48
346
@puneeshdeora
Puneesh Deora
5 days
When people at Meta see someone walk in knowing calculus, linear algebra, and probability theory
@minilek
Jelani Nelson
5 days
See below on what Zuckerberg is looking for in star recruits worth $100m pay packages for Meta’s plans in Artificial Intelligence. But weren’t some people saying calculus is no longer useful in the AI age? 🤔
Tweet media one
0
0
3
@puneeshdeora
Puneesh Deora
8 days
We also probe:. • model size 🏋️‍♂️.• skewed training mixtures ⚖️.• context length 📏.• LSTMs. For more details check out our paper:.🔗📜. Work done with amazing collaborators: @bhavya_vasudeva , Tina Behnia, Christos Thrampoulidis.
2
1
10
@puneeshdeora
Puneesh Deora
8 days
We also verify this Occam's razor-like inductive bias on GPT-4 using Boolean functions as case study: majority-vs-first-bit. The model prefers the simple function on ambiguous examples in-context.
Tweet media one
1
0
6
@puneeshdeora
Puneesh Deora
8 days
Why? 🔍 Bayesian lens:. The output is a posterior-weighted mix of s-gram (or LS for linear regression) predictors. When the simpler hypothesis explains the data well enough, the complexity penalty tips the posterior in its favour.
Tweet media one
1
0
8
@puneeshdeora
Puneesh Deora
8 days
Example #2 👉 linear regression with d/2 vs. d-dimensional regressors. Result: For d/2-dim class data, even when both d and d/2-least squares (LS) solutions fit the context, the model uses d/2-LS solution. For d-dim class data, it uses d-LS solution.
Tweet media one
1
0
6
@puneeshdeora
Puneesh Deora
8 days
We train transformers on tasks from hierarchical complexity categories—simple ✖️ complex. Example #1 👉 order-1 vs. order-3 Markov chains. Result: The model identifies the order and switches between bigram and tetragram stats on the fly.
Tweet media one
1
1
8
@puneeshdeora
Puneesh Deora
9 days
People move to SF and the mildest take turns into a 2000 word manifesto.
0
0
1
@puneeshdeora
Puneesh Deora
23 days
Idek what AGI is. Seriously, can someone define it for me please?.
@QuanquanGu
Quanquan Gu
23 days
Are there still any AI experts who think we won’t achieve AGI soon?.
1
0
2
@puneeshdeora
Puneesh Deora
25 days
Shannon's Master's Thesis laid the groundwork for digital circuits; what are you trying to pull here.
0
0
4
@puneeshdeora
Puneesh Deora
1 month
Tweet media one
@ns123abc
NIK
1 month
Anthropic researchers: “Even if AI progress completely stalls today and we don’t reach AGI… the current systems are already capable of automating ALL white-collar jobs within the next 5 five years” . It’s over.
0
0
0
@puneeshdeora
Puneesh Deora
2 months
Global reaction to Overleaf going down.
6
74
655
@puneeshdeora
Puneesh Deora
2 months
Overleaf strategic timeout — giving people a break from all the grind.
0
0
2
@puneeshdeora
Puneesh Deora
2 months
I wonder if the Pope rawdogs his PyTorch code like God intended or if he’s out here using Claude like a Protestant.
0
0
1
@puneeshdeora
Puneesh Deora
2 months
Math pope appointed near the NeurIPS deadline. Are these signs?.
0
0
0
@puneeshdeora
Puneesh Deora
2 months
Desk rejection of your papers if your reviews are next level trash
Tweet media one
0
0
0
@puneeshdeora
Puneesh Deora
2 months
If you don't submit reviews on time, you lose access to your own reviews. I like this :). Be careful while choosing your co-authors :P
Tweet media one
@NeurIPSConf
NeurIPS Conference
2 months
Responsible reviewing initiatives for NeurIPS 2025 - read more about changes to reviewing that that will safeguard reviewing quality and timeline in our blog post below: .
1
0
2
@puneeshdeora
Puneesh Deora
2 months
RT @DBahdanau: Adam deserves the award, but in Singapore everyone still uses SGD.
0
64
0
@puneeshdeora
Puneesh Deora
2 months
I will be presenting our recent work on In-context Learning with multiple task groups at the SCSL workshop tomorrow (@SCSLWorkshop) at #ICLR2025. Swing by and say hi! 😄
Tweet media one
0
4
16
@puneeshdeora
Puneesh Deora
2 months
I presented our TMLR work on Optimization and Generation of Multi-head Attention at #ICLR2025 today. Please use your time machines to attend 😛 . Thanks to people who stopped/will stop by.
Tweet media one
0
2
20