
Junru Shao
@junrushao
Followers
2K
Following
672
Media
8
Statuses
558
opinions are my own
California, USA
Joined October 2012
RT @InfiniAILab: 🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multivers….
0
82
0
RT @Lei_Wang_1999: The DeepSeek team is so audacious as they tried writing tilelang kernels🥰, and luckily it's fast. Huge thanks for giving….
0
32
0
RT @abcdabcd987: Lower latency and Higher throughput -- Get both with multi-node deployment for MoE models like DeepSeek-V3/R1.
0
8
0
RT @tqchenml: Happy to share our latest work at @ASPLOSConf 2025! LLMs are dynamic, both in sequence and batches. Relax brings an ML compil….
arxiv.org
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their...
0
36
0
RT @Lei_Wang_1999: Happy to announce tilelang v0.1.3 🚀 Love to see and huge thanks for contributors to bring enhancements, optimizations, a….
0
6
0
RT @ye_combinator: LLM is not all about tensor cores. categorical sampling under filters (top-p/top-k/min-p) are critical operators in llms….
0
9
0
RT @shiyi_c98: Thanks @_akhaliq for sharing our new work (a great effort led by @DachengLi177 ) in the coding domain @NovaSkyAI! .S* extend….
0
3
0
RT @Lei_Wang_1999: Building on top of tvm is powerful! 🙌 I was able to adapt WGSL (WebGPU codegen) from TVM to Tile language in just a few….
0
3
0
RT @Lei_Wang_1999: Excited to release tilelang v0.1.0, another pythonic dsl for writing AI kernels with optional layout/pipeline annotation….
0
23
0
RT @charlie_ruan: @deepseek_ai R1 Distilled models now on #WebLLM — locally accelerated by @WebGPU and counting "r"s in 🍓. Reasoning models….
0
7
0
RT @HongyiJin258: 🚀Making cross-engine LLM serving programmable. Introducing LLM Microserving: a new RISC-style approach to design LLM ser….
0
30
0
RT @ye_combinator: We are excite to announce FlashInfer v0.2!. Core contributions of this release include:.- Block/Vector Sparse (Paged) A….
0
41
0
RT @tqchenml: 🚀Future LLM agents speak JSON, python, and other structures. Excited to announce XGrammar, an structured generation library….
0
60
0
RT @yi_xin_dong: 🚀✨Introducing XGrammar: a fast, flexible, and portable engine for structured generation!. 🤖Accurate JSON/grammar generatio….
0
65
0
Always enjoy reading @Yuchenj_UW’s thread and thanks for the transparency from @hyperbolic_labs.
Here’s my story about hosting Reflection 70B on @hyperbolic_labs:. On Sep 3, Matt Shumer reached out to us, saying he wanted to release a 70B LLM that should be the top OSS model (far ahead of 405B), and he asked if we were interested in hosting it. At that time, I thought it was
2
0
12
Novelty considered harmful in this case. PyTorch/numpy syntax is a proven de facto standard to general users, so there’s literally no reason to reinvent the wheels.
In 2020, like 7 JAX NN libraries came out from different teams at Google. It was the pandemic so I had nothing to do. I used to just go into their github's and post "Make it PyTorch!" and they would get increasingly mad at me. It was a real eye-opener on Google culture.
0
0
5
RT @hyhieu226: 📚🧑‍🎓New tutorial on WGMMA (WarpGroup Matrix Multiplication and Accumulation) If you have run PyTorc….
0
49
0
RT @boson_ai: Excited to share Higgs-V2, improved both general and roleplaying abilities. The performance boost comes from the in-house bui….
0
4
0