
Shivam Duggal
@ShivamDuggal4
Followers
1K
Following
2K
Media
17
Statuses
153
PhD Student @MIT | Prev: Carnegie Mellon University @SCSatCMU | Research Scientist @UberATG
Joined June 2017
Compression is the heart of intelligence.From Occam to Kolmogorov—shorter programs=smarter representations. Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
13
63
348
Talking about KARL today — our recent work on a Kolmogorov Complexity–inspired adaptive tokenizer. Details about the paper here: More broadly, quite excited about representation learning — and understanding large models — through the lens of compression.
@ShivamDuggal4 of @MIT in a deep dive @ploutosai .w/ @ceciletamura , Head of Community. Don't miss it! .
0
2
21
RT @mihirp98: We ran more experiments to better understand “why” diffusion models do better in data-constrained settings than autoregressiv….
0
60
0
One "Skild brain" powers all embodiments—amazing work! Huge congratulations to entire team. Excited to see what’s next. Miss you all <3 !.
Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead.
0
2
18
For @NeurIPSConf, we can't update the main PDF or upload a separate rebuttal PDF — so no way to include any new images or visual results?.What if reviewers ask for more vision experiments? 🥲.Any suggestions or workarounds?.
5
0
11
Great work from great people! @mihirp98 @pathak2206.AR aligns w/ compression theory (KC, MDL, arithmetic coding), but diffusion is MLE too. Can we interpret diffusion similarly? Curious how compression explains AR vs. diffusion scaling laws. (Ilya’s talk touches on this too.).
🚨 The era of infinite internet data is ending, So we ask:. 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?. TL;DR:. ▶️Compute-constrained? Train Autoregressive models. ▶️Data-constrained? Train Diffusion models. Get ready for 🤿 1/n
1
2
12
Indeed! I find H-Net to be closely related to KARL — and even our earlier work ALIT (the recurrent tokenizer in the figure below) shares strong connections. Loved reading H-Net, like all @_albertgu’s work. Congrats to @sukjun_hwang and team!.
Single-pass Adaptive Image Tokenization for Minimum Program Search. KARL is a single-pass adaptive image tokenizer that predicts how many tokens are needed based on Kolmogorov Complexity, without test-time search. It halts once enough information is captured, using token count as
1
3
31
RT @phillip_isola: Our new work on adaptive image tokenization:. Image —> T tokens. * variable T, based on image complexity.* single forwar….
0
30
0
Excited to share this work on studying representation learning from a compression perspective!.Grateful to my amazing advisors—Professors Bill Freeman, Antonio Torralba, @phillip_isola @MITCSAIL.📄 Paper: 💻 Code: AIT meets AIT!.
github.com
Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image? - ShivamDuggal4/karl
0
1
14
KC isn’t everything! There is more to AIT. Pure noise & rich structure can both have high KC—but only one is interesting. What’s truly interesting often lies in the middle: patterns that are partly predictable, partly surprising. (See @pbloemesquire’s excellent slide deck!)
1
2
6
Secret sauce? Loss-conditioned training—tasks sampled via failed compression. Complex images can’t meet low error w/ small budgets, so never paired w/ such targets in phase 2. Each image trains at its own complexity.Inspired by @ShaneLegg @mhutter42’s Universal Intelligence (8/n)
1
1
6
Our algorithm resembles @SchmidhuberAI's Upside-Down RL. Phase 1 attempts lossless compression at budget T, defining the task as {image, T, loss 𝜖}. Phase 2 learns to tokenize the task—matching 𝜖 using only T out of T + ΔT tokens, masking the extras, now supervised w/ BCE (6/n)
1
1
9