Abhishek Panigrahi @Abhishek_034 X Profile

Abhishek Panigrahi

@Abhishek_034

Followers

793

Following

794

Media

20

Statuses

113

Ph.D. @PrincetonCS Previously Research Fellow @IndiaMSR and undergrad @iitkgp

https://t.co/g5EGlwZaWg

Joined January 2020

Don't wanna be here? Send us removal request.

Abhishek Panigrahi

@Abhishek_034

2 years

**New paper https://t.co/sJD3tnJcFA** In-context learning was explained as simulate + train simple models at inference. We show a 2B model can run GD on an internal 125M model. Surprising simulation + AI safety implications! 1/5 w/ @SadhikaMalladi,@xiamengzhou,@prfsanjeevarora

2

49

234

Abhishek Panigrahi

@Abhishek_034

1 month

Even “saturated” models have more to give! Check out our work on skill-targeted training that squeezes out extra gains on MATH — and transfers them to AMC & AIME 🔥 Skill-targeted SFT breaks through the saturation plateau and shows true cross-task skill generalization!

Yinghui He

@yinghui_he_

1 month

Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can

0

1

14

Sanjeev Arora

@prfsanjeevarora

1 month

Wonderful to see these concepts take center stage in AI. The idea of using LLM metacognition to extract skills and **self-improve** was introduced in our 2024 paper https://t.co/NnejLyUUid. Using skills to generate very strong synthetic data: https://t.co/LwQx5r1arU Using skills

arxiv.org

We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful...

Alex Albert

@alexalbert__

1 month

Today we're introducing Skills in claude dot ai, Claude Code, and the API. Skills let you package specialized knowledge into reusable capabilities that Claude loads on demand as agents tackle more complex tasks. Here's how they work and why they matter for the future of agents:

3

18

145

Abhishek Panigrahi

@Abhishek_034

2 months

Join Sadhika's group at UCSD to do cool research on the theory of deep learning. She has been a great mentor to me in my Ph.D. and she will support you through all the ups and downs in your Ph.D.!!

Sadhika Malladi

@SadhikaMalladi

2 months

Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:

0

7

Abhishek Panigrahi

@Abhishek_034

4 months

Come to our workshop tomorrow at West Ballroom Hall B, Vancouver center. We have an amazing series of talks, spanning benchmark evaluation, efficient inference, jailbreaking, and reasoning in small LMs by @AdtRaghunathan, @tri_dao, @RICEric22 and Yejin Choi. Also, we have an

MOSS

@MOSS_workshop

4 months

MOSS is happening this Saturday (7/19) at West Ballroom B, Vancouver Center! We are excited to have an amazing set of talks, posters, and panel discussions on the insights from and potential of small-scale analyses. Hope to see a lot of you there! 💡

0

4

8

Abhishek Panigrahi

@Abhishek_034

5 months

Do VLMs perform as well as the LLMs on which they build upon? We say no! But how can we reduce the gap? Come to learn more at our poster tomorrow (7/15) (4.30-7 pm). #icml25

Yun (Catherine) Cheng

@chengyun01

5 months

@parksimon0808, @Abhishek_034 & I will present this work tmr (7/15) 4:30p (East Hall 2707) @ #ICML2025‼️ Chat w/ us on: transferring LLM's strong text reasoning to visual reasoning of VLMs via image2text conversion w/o aligning representation (can be internalized at test time)

0

2

9

Abhishek Panigrahi

@Abhishek_034

5 months

Come to learn more about how helpful in-context information can improve optimization in LLMs at our spotlight poster tomorrow (11 am-1.30 pm)! #icml25

Xingyu Zhu

@XingyuZhu_

5 months

@Abhishek_034 and I are presenting our #ICML2025 🚨Spotlight Poster "On the Power of Context-Enhanced Learning in LLMs" at 11:00AM-1:30PM tomorrow (7/15) at East Hall E-2107. Come to chat with us and learn how and why ICL capabilities of LLMs can benefit its in-weight learning.

1

2

10

Abhishek Panigrahi

@Abhishek_034

5 months

🎉 Excited to present 2 papers at #ICML2025 in Vancouver! Happy to chat about curricula, efficient and robust training of LLMs! 🧠 On the power of Context-Enhanced learning in LLMs 🖼️ Spotlight Poster: Tuesday, 11:00am–1:30pm (#E-2107) ⚙️ Generalizing from SIMPLE to HARD

1

6

40

Vaishnavh Nagarajan

@_vaishnavh

6 months

📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵

1

45

169

MOSS

@MOSS_workshop

6 months

We are extending the deadline to May 26th 4:59pm PDT (11:59pm UTC). Thank you everyone for your interest & inquiries; we look forward to learning about your results! 🪄

MOSS

@MOSS_workshop

7 months

Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025! 🔗Website: https://t.co/lZdKPrw4Pt 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training

0

10

13

Abhishek Panigrahi

@Abhishek_034

7 months

Join us at MOSS workshop at @icmlconf 2025 to explore how small-scale experimentation can unlock deep insights into deep learning phenomenons. We welcome your submissions across wide-range of topics. See you all in Vancouver! #ICML2025 #MOSSWorkshop

MOSS

@MOSS_workshop

7 months

Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025! 🔗Website: https://t.co/lZdKPrw4Pt 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training

1

2

15

Abhishek Panigrahi

@Abhishek_034

7 months

🔬 Also presenting 2 papers at #ICLR2025 workshops! 🧩 Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? 📌 Posters at: • FM-Wild (Sun) • Re-Align (Mon) • Data-FM (Mon) 🧠 On the Power of Context-Enhanced Learning in

0

1

5

Abhishek Panigrahi

@Abhishek_034

7 months

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster:

2

11

61

Abhishek Panigrahi

@Abhishek_034

8 months

Thrilled and honored to be a recipient of the 2025 Apple Scholars in AI/ML PhD fellowship! I'm extremely grateful to my advisor, mentors, and collaborators for their invaluable support throughout my PhD journey. https://t.co/z9EtXvEhB0

machinelearning.apple.com

Apple is proud to announce the 2025 recipients of the Apple Scholars in AIML PhD fellowship.

Princeton Computer Science

@PrincetonCS

8 months

Congrats to @Abhishek_034 on receiving an Apple Scholars in AIML fellowship! 🎉🍎 The fellowship supports grad students doing innovative research in machine learning and artificial intelligence. Panigrahi is a PhD student advised by @prfsanjeevarora. https://t.co/r3ssH0aMOY

26

1

137

Abhishek Panigrahi

@Abhishek_034

9 months

Excited to share our work on Context-Enhanced Learning (CEL) for LLMs! Inspired by how humans learn with books, CEL accelerates training with helpful information in context— but avoiding verbatim memorization. Backed by extensive theory & mechanistic experiments on Llama models!

Xingyu Zhu

@XingyuZhu_

9 months

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!

0

4

26

Abhishek Panigrahi

@Abhishek_034

11 months

Check out our new paper on modality imbalance in VLMs! We propose a framework to quantify text vs. image learning differences & strategies to bridge the gap, backed by interesting gradient alignment studies. Simon is applying for PhD positions—an exceptional candidate to hire!

Simon Park

@parksimon0808

11 months

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper https://t.co/S0HhYN7cvz Code https://t.co/GJsgZof2k7 @Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora

0

1

10

Abhishek Panigrahi

@Abhishek_034

1 year

(8/8) We show similar results for natural languages (Wikipedia+Books), where n-gram curriculum arises during the period of sharp loss drop, and progressive distillation helps accelerate training.

0

1

4

Abhishek Panigrahi

@Abhishek_034

1 year

(7/8) For PCFG, the model exhibits 3 phases. 2nd one resembles the phase transition in parity, during which the teacher provides an implicit curriculum via increased dependency on easy-to-learn short n-grams, and checkpoint in the phase transition is the most helpful.

1

0

4

Abhishek Panigrahi

@Abhishek_034

1 year

(6/8) Similar observations extend to PCFG and natural languages, where models are trained for masked prediction. Progressive distillation helps the student learn structures of the data, as measured by dependencies on n-grams and accuracy of PCFG non-terminal prediction.

1

0

1

Abhishek Panigrahi

@Abhishek_034

1 year

(5/8) The implicit curriculum provably reduces the sample complexity and empirically allows the student to learn as fast as a larger teacher (wider MLP, or Transformer with more attention heads). Moreover, 1 checkpoint from the phase transition suffices to accelerate the student.

1

0

1

Abhishek Panigrahi

@Abhishek_034

1 year

(4/8) For sparse parity, the implicit curriculum manifests as a spike in the correlations of the teacher's predictions to low-degree monomials of in-support variables, which occurs during a phase transition period of the accuracy.

1

0

1