Daniel Israel @danielmisrael X Profile

Daniel Israel

@danielmisrael

Followers

1K

Following

196

Media

17

Statuses

63

PhD Student @UCLA Working on faster LLM inference algorithms.

https://t.co/8rzKZuNjS5

Joined October 2011

Don't wanna be here? Send us removal request.

Daniel Israel

@danielmisrael

1 month

🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)

11

22

168

Daniel Israel

@danielmisrael

5 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

7

45

308

Daniel Israel

@danielmisrael

5 days

Special thanks to all of my collaborators! @tjingrant @ellieyhc @guyvdb @adityagrover_ @suvinay @mcarbin Website: https://t.co/P78FEsWaSl Paper: https://t.co/J8EfYNKseY Github:

github.com

Contribute to planned-diffusion/planned-diffusion development by creating an account on GitHub.

1

0

7

Daniel Israel

@danielmisrael

5 days

Adjusting the # of diffusion steps wrt planned output length produces a smooth trade-off between quality and speed. More steps → higher quality/more latency Less steps → faster/lower quality Planned Diffusion provides inference time knobs to easily configure the tradeoff. (5/6)

1

0

6

Daniel Israel

@danielmisrael

5 days

We evaluated Planned Diffusion against decoding with AR/diffusion models on AlpacaEval. Planned Diffusion is faster than AR and diffusion, even with aggressively parallel dLLM configurations (Fast-dLLM), while maintaining comparable quality. (4/6)

1

0

7

Daniel Israel

@danielmisrael

5 days

We trained on an annotated dataset with custom control tags that describes the plan for each example and used a custom attention mask for AR/diffusion hybrid models. During inference, our interpreter parses these control tags and alternate between AR/diffusion modes. (3/6)

1

0

9

Daniel Israel

@danielmisrael

5 days

✨📝 Planned Diffusion 📝 ✨ is a single hybrid model alternating between autoregressive planning and diffusion denoising for faster text generation. On AlpacaEval, planned diffusion achieved 1.27-1.81x speedup over autoregressive generation with only 0.87-5.4% quality drop!

1

0

9

Daniel Israel

@danielmisrael

5 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

7

45

308

Daniel Israel

@danielmisrael

11 days

If you are interested in KV cache compression or LLM efficiency, I think this is a must-read. Your KV eviction policy is not as robust as you think, but it can be fixed in a simple way. @itisalex3 was great to work with, and I highly recommend him to any potential PhD advisor.

Alex Chen

@itisalex3

11 days

What happens when we compress the KV cache of prompts with multiple instructions? 🤔 Existing compression methods can lead to some instructions being ignored. 🙀 We propose simple changes to KV cache eviction that fix this problem alongside other pitfalls to be aware of. 💯

0

7

Shufan (Jack) Li

@li78658171

1 month

(1/n) We are excited to announce LaViDa-O, a state-of-the-art unified diffusion LM for image understanding, generation, and editing. Building on our NeurIPS Spotlight submission LaViDa, LaViDa-O offers up to 6.8x speed compared with AR mdoels with high output quality.

2

14

49

Daniel Israel

@danielmisrael

1 month

Special thanks to my advisors @guyvdb and @adityagrover_ . Also huge thanks to @OliverBroadrick who played a crucial role in brainstorming. (7/n) Github: https://t.co/AFZ1GNTXIH Paper:

arxiv.org

The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically...

0

10

Daniel Israel

@danielmisrael

1 month

We see massive speedups with APD on benchmarks and very little performance degradation. APD offers three parameters (R, W, M) to tradeoff and capture the Pareto frontier. We can get as much as a 10x increase in throughput over vanilla diffusion generation. (6/n)

1

0

8

Daniel Israel

@danielmisrael

1 month

APD uses a small autoregressive model to capture the joint dependencies that a diffusion model cannot in a single forward pass. The algorithm bears some similarities to drafting and verifying in speculative decoding. Please see the full paper for more details (5/n)

1

0

7

Daniel Israel

@danielmisrael

1 month

Empirically, increasing the number of tokens sampled in parallel causes performance on downstream tasks to go down. This can be seen in open source dLLMs Dream and Llada. While the tradeoff between speed and quality will always exist, it doesn't need to be so clear-cut (4/n)

1

0

7

Daniel Israel

@danielmisrael

1 month

dLLMs can sample in parallel, but sampling from the conditional marginals independently will cause fundamental issues when trying to model the joint distribution. See the below example (from the paper Discrete Copula Diffusion). (3/n)

1

0

6

Daniel Israel

@danielmisrael

1 month

This is a paper about speeding up diffusion language models (dLLMs). Here’s a quick summary of the differences between diffusion and autoregressive LLMs. (2/n)

1

0

6

Siyan Zhao

@siyan_zhao

1 month

Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL

AK

@_akhaliq

1 month

Inpainting-Guided Policy Optimization for Diffusion Large Language Models

4

28

190

Ahmad Beirami

@abeirami

3 months

Had the pleasure of learning about TRACE by Gwen Yidou-Weng, Benjie Wang, and @guyvdb at ICML! It view alignment/controlled decoding through a Bayesian lens and derives a simple, principled, and effective new method. I highly recommend reading this paper!

1

12

96

Tung Nguyen

@tungnd_13

4 months

🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.

21

251

2K

Shufan (Jack) Li

@li78658171

4 months

(1/6)Our work Reflect-DiT was accepted to #ICCV2025 ! Reflect-DiT allows the model to reflect on its past generations and textual feedback to self-correct and improve, extending reasoning to text-to-image generation.

1

23

93