Zihao Ye @ye_combinator X Profile

Zihao Ye

@ye_combinator

Followers

2K

Following

2K

Media

16

Statuses

134

Proud to be an engineer. I'm building flashinfer (https://t.co/PabCM3l09l)

Seattle

Joined October 2017

Don't wanna be here? Send us removal request.

Zihao Ye

@ye_combinator

10 days

RT @JokerEph: I’ve been starting to collaborate with the folks who are building FlashInfer: nice project and pretty amazing set of people!….

0

3

0

Zihao Ye

@ye_combinator

19 days

RT @zhyncs42: SGLang is an early user of FlashInfer and witnessed its rise as the de facto LLM inference kernel library. It won best paper….

0

14

0

Zihao Ye

@ye_combinator

19 days

RT @NVIDIAAIDev: 🔍 Our Deep Dive Blog Covering our Winning MLSys Paper on FlashInfer Is now live ➡️ Accelerate LLM….

0

27

0

Zihao Ye

@ye_combinator

19 days

RT @InfiniAILab: 🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multivers….

0

78

0

Zihao Ye

@ye_combinator

28 days

RT @GPU_MODE: Been excited about this talk for a while, @SonglinYang4 on efficient architecture! . Just started!.

0

27

0

Zihao Ye

@ye_combinator

28 days

RT @__tensorcore__: Another 🔥 blog about CUTLASS from @colfaxintl, this time focusing on the gory details of block-scaled MXFP and NVFP dat….

0

35

0

Zihao Ye

@ye_combinator

30 days

RT @HanGuo97: We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?. I….

0

191

0

Zihao Ye

@ye_combinator

1 month

RT @xieenze_jr: 🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 . Key Features🌟 .- Block-Wise KV Cache . R….

0

34

0

Zihao Ye

@ye_combinator

2 months

RT @tri_dao: I love Cutlass, and this new Python DSL looks very well-designed. Will for sure accelerate kernel dev + exploring new ideas in….

0

25

0

Zihao Ye

@ye_combinator

2 months

RT @NVIDIAHPCDev: 🎉CUTLASS 4.0 is here-bringing native #Python support for device-side kernel design, for ops like GEMM, Flash Attention, a….

0

36

0

Zihao Ye

@ye_combinator

2 months

RT @__tensorcore__: 🚨🔥 CUTLASS 4.0 is released 🔥🚨. pip install nvidia-cutlass-dsl. 4.0 marks a major shift for CUTLASS: towards native GPU….

0

82

0

Zihao Ye

@ye_combinator

2 months

We’re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! 🎉.This wouldn’t have been possible without the community — huge thanks to @lmsysorg’s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to.

NVIDIA AI Developer

@NVIDIAAIDev

2 months

🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. 🏆. 🙌 We are excited to share that we are now backing FlashInfer – a supporter and.

15

37

230

Zihao Ye

@ye_combinator

2 months

RT @JoyChew_d: Super excited to release FlexAttention for Inference with a decoding backend, GQA, PagedAttention, trainable bias and more!….

0

7

0

Zihao Ye

@ye_combinator

2 months

RT @tqchenml: If you are around in the Bay Area, make sure to attend the #MLSys2025 keynote tomorrow by @soumithchintala at the Santa Clara….

0

8

0

Zihao Ye

@ye_combinator

2 months

RT @yi_xin_dong: We are hosting a happy hour with @lmsysorg at #mlsys2025! Join us for engaging talks on SGLang, the structured generation….

0

12

0

Zihao Ye

@ye_combinator

2 months

RT @NovaSkyAI: 1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks l….

0

70

0

Zihao Ye

@ye_combinator

2 months

RT @DeeplyIgnorant: 🚀 We released Triton-distributed! 🌟.Build compute-comm. overlapping kernels for GPUs—performance rivals optimized libra….

0

10

0

Zihao Ye

@ye_combinator

3 months

RT @hyhieu226: Their content always comes out in great quantity and quality ❤️.

0

19

0

Zihao Ye

@ye_combinator

3 months

RT @abcdabcd987: Lower latency and Higher throughput -- Get both with multi-node deployment for MoE models like DeepSeek-V3/R1.

0

8

0

Zihao Ye

@ye_combinator

3 months

RT @Tim_Dettmers: Happy to announce that I joined the CMU Catalyst with three of my incoming students. Our research will bring the best m….

0

57

0