Abhishek Panigrahi
@Abhishek_034
Followers
793
Following
794
Media
20
Statuses
113
Ph.D. @PrincetonCS Previously Research Fellow @IndiaMSR and undergrad @iitkgp
Joined January 2020
**New paper https://t.co/sJD3tnJcFA** In-context learning was explained as simulate + train simple models at inference. We show a 2B model can run GD on an internal 125M model. Surprising simulation + AI safety implications! 1/5 w/ @SadhikaMalladi,@xiamengzhou,@prfsanjeevarora
2
49
234
Even “saturated” models have more to give! Check out our work on skill-targeted training that squeezes out extra gains on MATH — and transfers them to AMC & AIME 🔥 Skill-targeted SFT breaks through the saturation plateau and shows true cross-task skill generalization!
Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can
0
1
14
Wonderful to see these concepts take center stage in AI. The idea of using LLM metacognition to extract skills and **self-improve** was introduced in our 2024 paper https://t.co/NnejLyUUid. Using skills to generate very strong synthetic data: https://t.co/LwQx5r1arU Using skills
arxiv.org
We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful...
Today we're introducing Skills in claude dot ai, Claude Code, and the API. Skills let you package specialized knowledge into reusable capabilities that Claude loads on demand as agents tackle more complex tasks. Here's how they work and why they matter for the future of agents:
3
18
145
Join Sadhika's group at UCSD to do cool research on the theory of deep learning. She has been a great mentor to me in my Ph.D. and she will support you through all the ups and downs in your Ph.D.!!
Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here:
0
0
7
Come to our workshop tomorrow at West Ballroom Hall B, Vancouver center. We have an amazing series of talks, spanning benchmark evaluation, efficient inference, jailbreaking, and reasoning in small LMs by @AdtRaghunathan, @tri_dao, @RICEric22 and Yejin Choi. Also, we have an
MOSS is happening this Saturday (7/19) at West Ballroom B, Vancouver Center! We are excited to have an amazing set of talks, posters, and panel discussions on the insights from and potential of small-scale analyses. Hope to see a lot of you there! 💡
0
4
8
Do VLMs perform as well as the LLMs on which they build upon? We say no! But how can we reduce the gap? Come to learn more at our poster tomorrow (7/15) (4.30-7 pm). #icml25
@parksimon0808, @Abhishek_034 & I will present this work tmr (7/15) 4:30p (East Hall 2707) @ #ICML2025‼️ Chat w/ us on: transferring LLM's strong text reasoning to visual reasoning of VLMs via image2text conversion w/o aligning representation (can be internalized at test time)
0
2
9
Come to learn more about how helpful in-context information can improve optimization in LLMs at our spotlight poster tomorrow (11 am-1.30 pm)! #icml25
@Abhishek_034 and I are presenting our #ICML2025 🚨Spotlight Poster "On the Power of Context-Enhanced Learning in LLMs" at 11:00AM-1:30PM tomorrow (7/15) at East Hall E-2107. Come to chat with us and learn how and why ICL capabilities of LLMs can benefit its in-weight learning.
1
2
10
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
1
45
169
We are extending the deadline to May 26th 4:59pm PDT (11:59pm UTC). Thank you everyone for your interest & inquiries; we look forward to learning about your results! 🪄
Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025! 🔗Website: https://t.co/lZdKPrw4Pt 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training
0
10
13
Join us at MOSS workshop at @icmlconf 2025 to explore how small-scale experimentation can unlock deep insights into deep learning phenomenons. We welcome your submissions across wide-range of topics. See you all in Vancouver! #ICML2025 #MOSSWorkshop
Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at @icmlconf 2025! 🔗Website: https://t.co/lZdKPrw4Pt 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training
1
2
15
🔬 Also presenting 2 papers at #ICLR2025 workshops! 🧩 Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? 📌 Posters at: • FM-Wild (Sun) • Re-Align (Mon) • Data-FM (Mon) 🧠 On the Power of Context-Enhanced Learning in
0
1
5
Thrilled and honored to be a recipient of the 2025 Apple Scholars in AI/ML PhD fellowship! I'm extremely grateful to my advisor, mentors, and collaborators for their invaluable support throughout my PhD journey. https://t.co/z9EtXvEhB0
machinelearning.apple.com
Apple is proud to announce the 2025 recipients of the Apple Scholars in AIML PhD fellowship.
Congrats to @Abhishek_034 on receiving an Apple Scholars in AIML fellowship! 🎉🍎 The fellowship supports grad students doing innovative research in machine learning and artificial intelligence. Panigrahi is a PhD student advised by @prfsanjeevarora. https://t.co/r3ssH0aMOY
26
1
137
Excited to share our work on Context-Enhanced Learning (CEL) for LLMs! Inspired by how humans learn with books, CEL accelerates training with helpful information in context— but avoiding verbatim memorization. Backed by extensive theory & mechanistic experiments on Llama models!
Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!
0
4
26
Check out our new paper on modality imbalance in VLMs! We propose a framework to quantify text vs. image learning differences & strategies to bridge the gap, backed by interesting gradient alignment studies. Simon is applying for PhD positions—an exceptional candidate to hire!
Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper https://t.co/S0HhYN7cvz Code https://t.co/GJsgZof2k7
@Abhishek_034 @chengyun01 @dingli_yu @anirudhg9119 @prfsanjeevarora
0
1
10
(8/8) We show similar results for natural languages (Wikipedia+Books), where n-gram curriculum arises during the period of sharp loss drop, and progressive distillation helps accelerate training.
0
1
4
(7/8) For PCFG, the model exhibits 3 phases. 2nd one resembles the phase transition in parity, during which the teacher provides an implicit curriculum via increased dependency on easy-to-learn short n-grams, and checkpoint in the phase transition is the most helpful.
1
0
4
(6/8) Similar observations extend to PCFG and natural languages, where models are trained for masked prediction. Progressive distillation helps the student learn structures of the data, as measured by dependencies on n-grams and accuracy of PCFG non-terminal prediction.
1
0
1
(5/8) The implicit curriculum provably reduces the sample complexity and empirically allows the student to learn as fast as a larger teacher (wider MLP, or Transformer with more attention heads). Moreover, 1 checkpoint from the phase transition suffices to accelerate the student.
1
0
1
(4/8) For sparse parity, the implicit curriculum manifests as a spike in the correlations of the teacher's predictions to low-degree monomials of in-support variables, which occurs during a phase transition period of the accuracy.
1
0
1