Shivam Duggal Profile
Shivam Duggal

@ShivamDuggal4

Followers
1K
Following
2K
Media
17
Statuses
153

PhD Student @MIT | Prev: Carnegie Mellon University @SCSatCMU | Research Scientist @UberATG

Joined June 2017
Don't wanna be here? Send us removal request.
@ShivamDuggal4
Shivam Duggal
1 month
Compression is the heart of intelligence.From Occam to Kolmogorov—shorter programs=smarter representations. Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
Tweet media one
13
63
348
@ShivamDuggal4
Shivam Duggal
2 days
Talking about KARL today — our recent work on a Kolmogorov Complexity–inspired adaptive tokenizer. Details about the paper here: More broadly, quite excited about representation learning — and understanding large models — through the lens of compression.
@ceciletamura
Cecile Tamura
2 days
@ShivamDuggal4 of @MIT in a deep dive @ploutosai .w/ @ceciletamura , Head of Community. Don't miss it! .
Tweet media one
0
2
21
@ShivamDuggal4
Shivam Duggal
6 days
Strongest compressors might not be the best decoders for your task. RL can adapt pre-trained models into more "sophisticated" decoders, tuned to the task’s specific demands. Exciting thread & research!. Question: is next-token prediction really the final chapter in pretraining?.
0
2
9
@ShivamDuggal4
Shivam Duggal
11 days
RT @mihirp98: We ran more experiments to better understand “why” diffusion models do better in data-constrained settings than autoregressiv….
0
60
0
@ShivamDuggal4
Shivam Duggal
19 days
One "Skild brain" powers all embodiments—amazing work! Huge congratulations to entire team. Excited to see what’s next. Miss you all <3 !.
@SkildAI
Skild AI
19 days
Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead.
0
2
18
@ShivamDuggal4
Shivam Duggal
23 days
For @NeurIPSConf, we can't update the main PDF or upload a separate rebuttal PDF — so no way to include any new images or visual results?.What if reviewers ask for more vision experiments? 🥲.Any suggestions or workarounds?.
5
0
11
@ShivamDuggal4
Shivam Duggal
26 days
Great work from great people! @mihirp98 @pathak2206.AR aligns w/ compression theory (KC, MDL, arithmetic coding), but diffusion is MLE too. Can we interpret diffusion similarly? Curious how compression explains AR vs. diffusion scaling laws. (Ilya’s talk touches on this too.).
@mihirp98
Mihir Prabhudesai
26 days
🚨 The era of infinite internet data is ending, So we ask:. 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?. TL;DR:. ▶️Compute-constrained? Train Autoregressive models. ▶️Data-constrained? Train Diffusion models. Get ready for 🤿 1/n
Tweet media one
1
2
12
@ShivamDuggal4
Shivam Duggal
1 month
Indeed! I find H-Net to be closely related to KARL — and even our earlier work ALIT (the recurrent tokenizer in the figure below) shares strong connections. Loved reading H-Net, like all @_albertgu’s work. Congrats to @sukjun_hwang and team!.
@gm8xx8
𝚐𝔪𝟾𝚡𝚡𝟾
1 month
Single-pass Adaptive Image Tokenization for Minimum Program Search. KARL is a single-pass adaptive image tokenizer that predicts how many tokens are needed based on Kolmogorov Complexity, without test-time search. It halts once enough information is captured, using token count as
Tweet media one
1
3
31
@ShivamDuggal4
Shivam Duggal
1 month
RT @phillip_isola: Our new work on adaptive image tokenization:. Image —> T tokens. * variable T, based on image complexity.* single forwar….
0
30
0
@ShivamDuggal4
Shivam Duggal
1 month
Excited to share this work on studying representation learning from a compression perspective!.Grateful to my amazing advisors—Professors Bill Freeman, Antonio Torralba, @phillip_isola @MITCSAIL.📄 Paper: 💻 Code: AIT meets AIT!.
Tweet card summary image
github.com
Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image? - ShivamDuggal4/karl
0
1
14
@ShivamDuggal4
Shivam Duggal
1 month
Hint at modeling interestingness! 👀.Adaptive image tokenizers may go beyond KC—capturing sophistication or logical depth?. Measure Δ in reconstruction as tokens increase:.Big Δ → structure.Small Δ → trivial/noise.Mid Δ → maybe… interesting?.Future work awaits! (13/n)
Tweet media one
1
0
4
@ShivamDuggal4
Shivam Duggal
1 month
KC isn’t everything! There is more to AIT. Pure noise & rich structure can both have high KC—but only one is interesting. What’s truly interesting often lies in the middle: patterns that are partly predictable, partly surprising. (See @pbloemesquire’s excellent slide deck!)
Tweet media one
1
2
6
@ShivamDuggal4
Shivam Duggal
1 month
Kolmogorov Complexity perspective on ImageNet! 👇.We hope bringing Algorithmic Info. Theory into vision will spark deeper studies—like learning generalizes better when sampling images of optimal complexity, not just random ones?.Next scaling law might be information theory based?
Tweet media one
5
1
6
@ShivamDuggal4
Shivam Duggal
1 month
Scaling Laws via KARL's automatic variable token allocation––. Variable token counts per image at test time enable more faithful scaling laws—avoiding the trap of under-fitting complex images by assigning too few tokens. More scaling laws in appendix. (10/n)
Tweet media one
1
1
5
@ShivamDuggal4
Shivam Duggal
1 month
KARL vs recent Adaptive Tokenizers. KARL performs on par with prior methods across all single-image reconstruction metrics—L1, SSIM, PSNR, DreamSIM. It delivers better visual quality than One-D-Piece (Matryoshka-style), and matches ALiT (RNN-based), while being single-pass. 👇
Tweet media one
1
2
4
@ShivamDuggal4
Shivam Duggal
1 month
Secret sauce? Loss-conditioned training—tasks sampled via failed compression. Complex images can’t meet low error w/ small budgets, so never paired w/ such targets in phase 2. Each image trains at its own complexity.Inspired by @ShaneLegg @mhutter42’s Universal Intelligence (8/n)
Tweet media one
1
1
6
@ShivamDuggal4
Shivam Duggal
1 month
Our formulation is grounded in KC 📏.KCₑ(x, T) = min{ t ≤ T | L_rec(x, x̂) ≤ ε} .→ Fewest tokens to reconstruct x within ε. Adding input tokens (+ΔT) shouldn't change KC: KCₑ(x, T)=KCₑ(x, T+ΔT). Echoes Kolmogorov Structure Function for lossy compression (2000s).
Tweet media one
1
2
5
@ShivamDuggal4
Shivam Duggal
1 month
Our algorithm resembles @SchmidhuberAI's Upside-Down RL. Phase 1 attempts lossless compression at budget T, defining the task as {image, T, loss 𝜖}. Phase 2 learns to tokenize the task—matching 𝜖 using only T out of T + ΔT tokens, masking the extras, now supervised w/ BCE (6/n)
Tweet media one
1
1
9
@ShivamDuggal4
Shivam Duggal
1 month
We propose a two-phase, loss-conditioned training strategy:. (1) Estimate Image Complexity: Attempt (& fail at) lossless image compression w/ token budget T, yielding loss 𝜖. (2) Learning to Tokenize Complexity: Conditioned on 𝜖, learn to match it using T out of T + ΔT tokens.
Tweet media one
1
0
5
@ShivamDuggal4
Shivam Duggal
1 month
Given {image, token budget, recon quality}, KARL learns to approximate KC—by using only a subset of input tokens and masking the rest. But how to self-supervise this?.There's no ground truth for KC or token masks, and STE/Reinforcement Learning are hard to optimize! 😓 (4/n).
1
0
6
@ShivamDuggal4
Shivam Duggal
1 month
KARL is an Adaptive Tokenizer—simple images→fewer tokens, complex images→more tokens. Unlike prior methods needing iterative test-time search, KARL is single-pass!. Given task spec (reconstruction quality), it approximates KC in one go—generating minimal representations (3/n)
Tweet media one
1
1
11