Jian Chen Profile
Jian Chen

@jianchen1799

Followers
130
Following
56
Media
1
Statuses
19

Ph.D. Student @HDSIUCSD; Efficient ML systems and algorithms

San Diego, CA
Joined September 2024
Don't wanna be here? Send us removal request.
@jianchen1799
Jian Chen
14 days
๐Ÿš€Thrilled to share my first work as a PhD student! We propose a new home for Diffusion LLMs: not as competitors to AR models, but as ultra-fast drafters. DFlash is lightweight, cheap to run, and very effective (up to 6x speedup). Itโ€™s super easy to set upโ€”give it a try!
@zhijianliu_
Zhijian Liu
14 days
Holiday cooking finally ready to serve! ๐Ÿฅณ Introducing DFlash โ€” speculative decoding with block diffusion. ๐Ÿš€ 6.2ร— lossless speedup on Qwen3-8B โšก 2.5ร— faster than EAGLE-3 Diffusion vs AR doesnโ€™t have to be a fight. At todayโ€™s stage: โ€ข dLLMs = fast, highly parallel, but lossy
3
5
19
@tyao923
Yao Tang
4 days
๐—ง๐—ต๐—ถ๐—ป๐—ธ ๐˜„๐—ถ๐—ฑ๐—ฒ๐—ฟ. ๐—ง๐—ต๐—ถ๐—ป๐—ธ ๐˜€๐—ต๐—ผ๐—ฟ๐˜๐—ฒ๐—ฟ. ๐Ÿš€ ๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐— ๐˜‚๐—น๐˜๐—ถ๐—ฝ๐—น๐—ฒ๐˜… ๐—ง๐—ต๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด: token-wise branch-and-merge reasoning for LLMs. ๐Ÿ’ธ Discrete CoT is costly. ๐ŸŽ›๏ธ Existing continuous tokens often clash with ๐—ผ๐—ป-๐—ฝ๐—ผ๐—น๐—ถ๐—ฐ๐˜† ๐—ฅ๐—Ÿ
22
91
672
@jianchen1799
Jian Chen
11 days
โšก๏ธ Exciting news: DFlash is now on SGLang! We are unlocking new possibilities to accelerate LLM inference. ๐Ÿš€ Stay tunedโ€”more draft models and optimizations are dropping soon that will seriously speed up your workflows!
@zhijianliu_
Zhijian Liu
11 days
โšก Speed of flash. Just 2 days after launch, DFlash is already running in SGLang (@sgl_project). With serving-engine support, we can now unlock speedup with higher concurrency, and weโ€™ve quickly worked on a new demo based on it. More releases coming in the next few weeks. Weโ€™re
0
1
3
@JiaZhihao
Zhihao Jia
7 months
One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. ๐Ÿš€Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized
14
125
774
@InfiniAILab
Infini-AI-Lab
7 months
๐Ÿ”ฅ We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. ๐Ÿš€ Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% ๐ŸŒ Website: https://t.co/J9osByhWUf ๐Ÿงต 1/n
6
84
222
@InfiniAILab
Infini-AI-Lab
8 months
๐Ÿฅณ Happy to share our new work โ€“ ย Kinetics: Rethinking Test-Time Scaling Laws ๐Ÿค”How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! ๐Ÿšจ The O(Nยฒ)
7
69
247
@melissapan
Melissa Pan
9 months
๐Ÿšจ Why Do Multi-Agent LLM Systems Fail? โ‰๏ธ ๐Ÿ”ฅ Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: https://t.co/BC5YHS8ZRZ Code: https://t.co/Ea1FvGcaLs ๐Ÿงต1/n
7
59
217
@chenzhuoming911
chen zhuoming
9 months
๐Ÿšจ Thrilled to present our Spotlight at #ICLR2025: "MagicPIG: LSH Sampling for Efficient LLM Generation" by @RJ_Sadhukhan ๐ŸŽ‰ ๐Ÿ’ก MagicPIG enables KV compression for long-context LLMs โ€” where top-k falls short, sampling shines. โš™๏ธ Introduces CPU-GPU heterogeneous serving to boost
1
5
10
@jianchen1799
Jian Chen
9 months
๐Ÿš€ Low latency, high throughput, and lossless LLM inference โ€” all at once. ๐ŸŽ‚ Who says you canโ€™t have your cake and eat it too? #MagicDec proves you can. โšก MagicDec first shows that speculative decoding can boost throughput for moderate to long contexts โ€” and the speedup grows
@BeidiChen
Beidi Chen
1 year
๐ŸฅณPromised blogpost+tweet about MagicDec-1.0๐Ÿช„๐Ÿช„๐Ÿช„ 2.0 coming soon๐Ÿ˜‰: How can we achieve Lossless, High Throughput, and Low Latency LLM Inference all at once? Seems too good to be true? Introducing MagicDec-1.0๐Ÿช„, a Speculative Decoding (SD) based technique that can improve
1
3
7
@InfiniAILab
Infini-AI-Lab
11 months
๐Ÿš€ RAG vs. Long-Context LLMs: The Real Battle โš”๏ธ ๐ŸคฏTurns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. ๐ŸคกSo, do we even need long-context models? YES. Because todayโ€™s benchmarks are flawed: โ›ณ Too Simple โ€“
6
39
188