Puneesh Deora @puneeshdeora X Profile

Puneesh Deora

@puneeshdeora

Followers

109

Following

4K

Media

131

Statuses

678

PhD student at UBC. Working on the foundations of LLMs and theory of DL. Loves memes :)

Joined August 2019

Don't wanna be here? Send us removal request.

Puneesh Deora

@puneeshdeora

8 days

🚨 New paper drop! 🚨. 🤔 When a transformer sees a sequence that could be explained by many rules, which rule does it pick?. It chooses the simplest sufficient one! . 🧵👇

5

48

346

Puneesh Deora

@puneeshdeora

5 days

When people at Meta see someone walk in knowing calculus, linear algebra, and probability theory

Jelani Nelson

@minilek

5 days

See below on what Zuckerberg is looking for in star recruits worth $100m pay packages for Meta’s plans in Artificial Intelligence. But weren’t some people saying calculus is no longer useful in the AI age? 🤔

0

3

Puneesh Deora

@puneeshdeora

8 days

We also probe:. • model size 🏋️‍♂️.• skewed training mixtures ⚖️.• context length 📏.• LSTMs. For more details check out our paper:.🔗📜. Work done with amazing collaborators: @bhavya_vasudeva , Tina Behnia, Christos Thrampoulidis.

2

1

10

Puneesh Deora

@puneeshdeora

8 days

We also verify this Occam's razor-like inductive bias on GPT-4 using Boolean functions as case study: majority-vs-first-bit. The model prefers the simple function on ambiguous examples in-context.

1

0

6

Puneesh Deora

@puneeshdeora

8 days

Why? 🔍 Bayesian lens:. The output is a posterior-weighted mix of s-gram (or LS for linear regression) predictors. When the simpler hypothesis explains the data well enough, the complexity penalty tips the posterior in its favour.

1

0

8

Puneesh Deora

@puneeshdeora

8 days

Example #2 👉 linear regression with d/2 vs. d-dimensional regressors. Result: For d/2-dim class data, even when both d and d/2-least squares (LS) solutions fit the context, the model uses d/2-LS solution. For d-dim class data, it uses d-LS solution.

1

0

6

Puneesh Deora

@puneeshdeora

8 days

We train transformers on tasks from hierarchical complexity categories—simple ✖️ complex. Example #1 👉 order-1 vs. order-3 Markov chains. Result: The model identifies the order and switches between bigram and tetragram stats on the fly.

1

8

Puneesh Deora

@puneeshdeora

9 days

People move to SF and the mildest take turns into a 2000 word manifesto.

0

1

Puneesh Deora

@puneeshdeora

23 days

Idek what AGI is. Seriously, can someone define it for me please?.

Quanquan Gu

@QuanquanGu

23 days

Are there still any AI experts who think we won’t achieve AGI soon?.

1

0

2

Puneesh Deora

@puneeshdeora

25 days

Shannon's Master's Thesis laid the groundwork for digital circuits; what are you trying to pull here.

0

4

Puneesh Deora

@puneeshdeora

1 month

NIK

@ns123abc

1 month

Anthropic researchers: “Even if AI progress completely stalls today and we don’t reach AGI… the current systems are already capable of automating ALL white-collar jobs within the next 5 five years” . It’s over.

0

Puneesh Deora

@puneeshdeora

2 months

Global reaction to Overleaf going down.

6

74

655

Puneesh Deora

@puneeshdeora

2 months

Overleaf strategic timeout — giving people a break from all the grind.

0

2

Puneesh Deora

@puneeshdeora

2 months

I wonder if the Pope rawdogs his PyTorch code like God intended or if he’s out here using Claude like a Protestant.

0

1

Puneesh Deora

@puneeshdeora

2 months

Math pope appointed near the NeurIPS deadline. Are these signs?.

0

Puneesh Deora

@puneeshdeora

2 months

Desk rejection of your papers if your reviews are next level trash

0

Puneesh Deora

@puneeshdeora

2 months

If you don't submit reviews on time, you lose access to your own reviews. I like this :). Be careful while choosing your co-authors :P

NeurIPS Conference

@NeurIPSConf

2 months

Responsible reviewing initiatives for NeurIPS 2025 - read more about changes to reviewing that that will safeguard reviewing quality and timeline in our blog post below: .

1

0

2

Puneesh Deora

@puneeshdeora

2 months

RT @DBahdanau: Adam deserves the award, but in Singapore everyone still uses SGD.

0

64

0

Puneesh Deora

@puneeshdeora

2 months

I will be presenting our recent work on In-context Learning with multiple task groups at the SCSL workshop tomorrow (@SCSLWorkshop) at #ICLR2025. Swing by and say hi! 😄

0

4

16

Puneesh Deora

@puneeshdeora

2 months

I presented our TMLR work on Optimization and Generation of Multi-head Attention at #ICLR2025 today. Please use your time machines to attend 😛 . Thanks to people who stopped/will stop by.

0

2

20