Jian Chen @jianchen1799 X Profile

Jian Chen

@jianchen1799

Followers

130

Following

56

Media

1

Statuses

19

Ph.D. Student @HDSIUCSD; Efficient ML systems and algorithms

San Diego, CA

Joined September 2024

Don't wanna be here? Send us removal request.

Jian Chen

@jianchen1799

14 days

🚀Thrilled to share my first work as a PhD student! We propose a new home for Diffusion LLMs: not as competitors to AR models, but as ultra-fast drafters. DFlash is lightweight, cheap to run, and very effective (up to 6x speedup). It’s super easy to set up—give it a try!

Zhijian Liu

@zhijianliu_

14 days

Holiday cooking finally ready to serve! 🥳 Introducing DFlash — speculative decoding with block diffusion. 🚀 6.2× lossless speedup on Qwen3-8B ⚡ 2.5× faster than EAGLE-3 Diffusion vs AR doesn’t have to be a fight. At today’s stage: • dLLMs = fast, highly parallel, but lossy

3

5

19

Yao Tang

@tyao923

4 days

𝗧𝗵𝗶𝗻𝗸 𝘄𝗶𝗱𝗲𝗿. 𝗧𝗵𝗶𝗻𝗸 𝘀𝗵𝗼𝗿𝘁𝗲𝗿. 🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴: token-wise branch-and-merge reasoning for LLMs. 💸 Discrete CoT is costly. 🎛️ Existing continuous tokens often clash with 𝗼𝗻-𝗽𝗼𝗹𝗶𝗰𝘆 𝗥𝗟

22

91

672

Jian Chen

@jianchen1799

11 days

⚡️ Exciting news: DFlash is now on SGLang! We are unlocking new possibilities to accelerate LLM inference. 🚀 Stay tuned—more draft models and optimizations are dropping soon that will seriously speed up your workflows!

Zhijian Liu

@zhijianliu_

11 days

⚡ Speed of flash. Just 2 days after launch, DFlash is already running in SGLang (@sgl_project). With serving-engine support, we can now unlock speedup with higher concurrency, and we’ve quickly worked on a new demo based on it. More releases coming in the next few weeks. We’re

0

1

3

Zhihao Jia

@JiaZhihao

7 months

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

14

125

774

Infini-AI-Lab

@InfiniAILab

7 months

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n

6

84

222

Infini-AI-Lab

@InfiniAILab

8 months

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)

7

69

247

Melissa Pan

@melissapan

9 months

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: https://t.co/BC5YHS8ZRZ Code: https://t.co/Ea1FvGcaLs 🧵1/n

7

59

217

chen zhuoming

@chenzhuoming911

9 months

🚨 Thrilled to present our Spotlight at #ICLR2025: "MagicPIG: LSH Sampling for Efficient LLM Generation" by @RJ_Sadhukhan 🎉 💡 MagicPIG enables KV compression for long-context LLMs — where top-k falls short, sampling shines. ⚙️ Introduces CPU-GPU heterogeneous serving to boost

1

5

10

Jian Chen

@jianchen1799

9 months

🚀 Low latency, high throughput, and lossless LLM inference — all at once. 🎂 Who says you can’t have your cake and eat it too? #MagicDec proves you can. ⚡ MagicDec first shows that speculative decoding can boost throughput for moderate to long contexts — and the speedup grows

Beidi Chen

@BeidiChen

1 year

🥳Promised blogpost+tweet about MagicDec-1.0🪄🪄🪄 2.0 coming soon😉: How can we achieve Lossless, High Throughput, and Low Latency LLM Inference all at once? Seems too good to be true? Introducing MagicDec-1.0🪄, a Speculative Decoding (SD) based technique that can improve

1

3

7

Infini-AI-Lab

@InfiniAILab

11 months

🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –

6

39

188