Daniel Israel Profile
Daniel Israel

@danielmisrael

Followers
1K
Following
196
Media
17
Statuses
63

PhD Student @UCLA Working on faster LLM inference algorithms.

Joined October 2011
Don't wanna be here? Send us removal request.
@danielmisrael
Daniel Israel
1 month
πŸ”¦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧡 (1/n)
11
22
168
@danielmisrael
Daniel Israel
5 days
"An hour of planning can save you 10 hours of doing." βœ¨πŸ“ Planned Diffusion πŸ“ ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8Γ— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
7
45
308
@danielmisrael
Daniel Israel
5 days
Adjusting the # of diffusion steps wrt planned output length produces a smooth trade-off between quality and speed. More steps β†’ higher quality/more latency Less steps β†’ faster/lower quality Planned Diffusion provides inference time knobs to easily configure the tradeoff. (5/6)
1
0
6
@danielmisrael
Daniel Israel
5 days
We evaluated Planned Diffusion against decoding with AR/diffusion models on AlpacaEval. Planned Diffusion is faster than AR and diffusion, even with aggressively parallel dLLM configurations (Fast-dLLM), while maintaining comparable quality. (4/6)
1
0
7
@danielmisrael
Daniel Israel
5 days
We trained on an annotated dataset with custom control tags that describes the plan for each example and used a custom attention mask for AR/diffusion hybrid models. During inference, our interpreter parses these control tags and alternate between AR/diffusion modes. (3/6)
1
0
9
@danielmisrael
Daniel Israel
5 days
βœ¨πŸ“ Planned Diffusion πŸ“ ✨ is a single hybrid model alternating between autoregressive planning and diffusion denoising for faster text generation. On AlpacaEval, planned diffusion achieved 1.27-1.81x speedup over autoregressive generation with only 0.87-5.4% quality drop!
1
0
9
@danielmisrael
Daniel Israel
5 days
"An hour of planning can save you 10 hours of doing." βœ¨πŸ“ Planned Diffusion πŸ“ ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8Γ— faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
7
45
308
@danielmisrael
Daniel Israel
11 days
If you are interested in KV cache compression or LLM efficiency, I think this is a must-read. Your KV eviction policy is not as robust as you think, but it can be fixed in a simple way. @itisalex3 was great to work with, and I highly recommend him to any potential PhD advisor.
@itisalex3
Alex Chen
11 days
What happens when we compress the KV cache of prompts with multiple instructions? πŸ€” Existing compression methods can lead to some instructions being ignored. πŸ™€ We propose simple changes to KV cache eviction that fix this problem alongside other pitfalls to be aware of. πŸ’―
0
0
7
@li78658171
Shufan (Jack) Li
1 month
(1/n) We are excited to announce LaViDa-O, a state-of-the-art unified diffusion LM for image understanding, generation, and editing. Building on our NeurIPS Spotlight submission LaViDa, LaViDa-O offers up to 6.8x speed compared with AR mdoels with high output quality.
2
14
49
@danielmisrael
Daniel Israel
1 month
We see massive speedups with APD on benchmarks and very little performance degradation. APD offers three parameters (R, W, M) to tradeoff and capture the Pareto frontier. We can get as much as a 10x increase in throughput over vanilla diffusion generation. (6/n)
1
0
8
@danielmisrael
Daniel Israel
1 month
APD uses a small autoregressive model to capture the joint dependencies that a diffusion model cannot in a single forward pass. The algorithm bears some similarities to drafting and verifying in speculative decoding. Please see the full paper for more details (5/n)
1
0
7
@danielmisrael
Daniel Israel
1 month
Empirically, increasing the number of tokens sampled in parallel causes performance on downstream tasks to go down. This can be seen in open source dLLMs Dream and Llada. While the tradeoff between speed and quality will always exist, it doesn't need to be so clear-cut (4/n)
1
0
7
@danielmisrael
Daniel Israel
1 month
dLLMs can sample in parallel, but sampling from the conditional marginals independently will cause fundamental issues when trying to model the joint distribution. See the below example (from the paper Discrete Copula Diffusion). (3/n)
1
0
6
@danielmisrael
Daniel Israel
1 month
This is a paper about speeding up diffusion language models (dLLMs). Here’s a quick summary of the differences between diffusion and autoregressive LLMs. (2/n)
1
0
6
@siyan_zhao
Siyan Zhao
1 month
Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL
@_akhaliq
AK
1 month
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
4
28
190
@abeirami
Ahmad Beirami
3 months
Had the pleasure of learning about TRACE by Gwen Yidou-Weng, Benjie Wang, and @guyvdb at ICML! It view alignment/controlled decoding through a Bayesian lens and derives a simple, principled, and effective new method. I highly recommend reading this paper!
1
12
96
@tungnd_13
Tung Nguyen
4 months
πŸš€ Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.
21
251
2K
@li78658171
Shufan (Jack) Li
4 months
(1/6)Our work Reflect-DiT was accepted to #ICCV2025 ! Reflect-DiT allows the model to reflect on its past generations and textual feedback to self-correct and improve, extending reasoning to text-to-image generation.
1
23
93