Yu Zhang πŸˆπŸ™ Profile
Yu Zhang πŸˆπŸ™

@yzhang_cs

Followers
1K
Following
5K
Media
8
Statuses
730

@Kimi_Moonshot; PhD Student @ Soochow University; working on efficient methods for LLMs; disciple of parallel programming; INTP

Joined February 2023
Don't wanna be here? Send us removal request.
@yzhang_cs
Yu Zhang πŸˆπŸ™
2 months
We’re also shipping fla-core in lock-step with flash-linear-attention: a minimal, forever-in-sync companion pkg that carries nothing except triton+torch Need only fused Norm, CausalConv, linear-attn kernels, w/o transformers worries? fla-core is enough. https://t.co/uspgYtZ4t0
Tweet card summary image
pypi.org
Core operations for flash-linear-attention
@SonglinYang4
Songlin Yang
2 months
Excited to see Gated DeltaNet being adopted in the @Alibaba_Qwen series ! It has also previously demonstrated strong effectiveness in @nvidia's Jet-Nemotron
1
8
65
@zhang_benita
弡小珺 Xiaojùn
7 hours
If you are interested in Kimi K2 thinking, you can check out this interview with Yang Zhilin, founder of Kimi (with Chinese and English bilingual subtitles):
3
6
51
@SonglinYang4
Songlin Yang
6 months
πŸ“’ (1/16) Introducing PaTH πŸ›£οΈ β€” a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks https://t.co/nJItUuYKWZ
Tweet card summary image
arxiv.org
The attention mechanism is a core primitive in modern large language models (LLMs) and AI more broadly. Since attention by itself is permutation-invariant, position encoding is essential for...
9
88
551
@gowerrobert
Robert M. Gower πŸ‡ΊπŸ‡¦
3 days
We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x (Work lead by the amazing @CrichaelMawshaw)
5
22
178
@ZhihuFrontier
Zhihu Frontier
1 day
πŸš€ "Quantization is not a compromise β€” it's the next paradigm." After K2-Thinking's release, many developers have been curious about its native INT4 quantization format. εˆ˜ε°‘δΌŸ, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice
13
90
531
@trikcode
Wise
2 days
πŸ˜†πŸ‘
81
67
2K
@GithubProjects
GitHub Projects Community
3 days
β–¬β–¬.β—™.β–¬β–¬ β–‚β–„β–„β–“β–„β–„β–‚ β—’β—€ β–ˆβ–€β–€β–ˆβ–ˆβ–ˆβ–ˆβ–„β–„β–„β—’β—€ β–ˆβ–„ β–ˆ β–ˆβ–„ β–ˆβ–ˆβ–ˆβ–€β–€β–€β–€β–€β–€β•¬ β—₯β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ—€ ══╩══╩══ ╬═╬ ╬═╬ Just dropped down to say ╬═╬ Don't ╬═╬ Push To Production On Friday ╬═╬ ╬═╬ ☻/ ╬═╬/β–Œ ╬═╬/ \
106
337
3K
@teortaxesTex
Teortaxes▢️ (DeepSeek ζŽ¨η‰ΉπŸ‹ι“η²‰ 2023 – ∞)
3 days
> All benchmark results are reported under INT4 precision. Do you understand what a flex this was. They go toe to toe with GPT-5 on the heaviest, longest-range tasks, with hundreds of tool calls. ALL IN INT4. Β«Convert to fp8 if you needΒ» Frontier lab.
@Kimi_Moonshot
Kimi.ai
3 days
πŸš€ Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. πŸ”Ή SOTA on HLE (44.9%) and BrowseComp (60.2%) πŸ”Ή Executes up to 200 – 300 sequential tool calls without human interference πŸ”Ή Excels in reasoning, agentic search, and coding πŸ”Ή 256K context window Built
21
84
763
@ArtificialAnlys
Artificial Analysis
3 days
MoonshotAI has released Kimi K2 Thinking, a new reasoning variant of Kimi K2 that achieves #1 in the Tau2 Bench Telecom agentic benchmark and is potentially the new leading open weights model Kimi K2 Thinking is one of the largest open weights models ever, at 1T total parameters
81
286
2K
@soumithchintala
Soumith Chintala
3 days
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years
496
554
11K
@lmsysorg
LMSYS Org
3 days
Day-0 support for Kimi K2 Thinking on SGLang ⚑ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path
@Kimi_Moonshot
Kimi.ai
3 days
πŸš€ Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. πŸ”Ή SOTA on HLE (44.9%) and BrowseComp (60.2%) πŸ”Ή Executes up to 200 – 300 sequential tool calls without human interference πŸ”Ή Excels in reasoning, agentic search, and coding πŸ”Ή 256K context window Built
0
6
26
@Zai_org
Z.ai
3 days
@Kimi_Moonshot Awesome!
33
15
1K
@Kimi_Moonshot
Kimi.ai
3 days
πŸš€ Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. πŸ”Ή SOTA on HLE (44.9%) and BrowseComp (60.2%) πŸ”Ή Executes up to 200 – 300 sequential tool calls without human interference πŸ”Ή Excels in reasoning, agentic search, and coding πŸ”Ή 256K context window Built
555
1K
9K
@Jianlin_S
jianlin.su
3 days
Expectation of the maximum of gaussian random variables https://t.co/UuPsm7LH4w
1
4
54
@kimmonismus
Chubby♨️
3 days
Holy moly, Kimi was cooking. Kimi-K2 thinking evals are very promising!
@synthwavedd
leo 🐾
3 days
Kimi K2 Thinking benchmarks are here and it's competitive with (and in some cases beats!) GPT-5 πŸ”₯πŸ”₯
12
28
503
@Yuchenj_UW
Yuchen Jin
4 days
I just love Hugging Face. Their new 200+ page Training Playbook covers everything: training frameworks, model architecture, data curation, pre/mid/post-training, eval, how GPUs work, latest research, and ablations. Packed with practical wisdom. I read it like a novel.
15
63
587
@synthwavedd
leo 🐾
3 days
Kimi K2 Thinking benchmarks are here and it's competitive with (and in some cases beats!) GPT-5 πŸ”₯πŸ”₯
8
10
278
@PyTorch
PyTorch
4 days
Hybrid models like Qwen3-Next, Nemotron Nano 2 and Granite 4.0 are now fully supported in vLLM! Check out our latest blog from the vLLM team at IBM to learn how the vLLM community has elevated hybrid models from experimental hacks in V0 to first-class citizens in V1. πŸ”—
1
35
139
@_albertgu
Albert Gu
4 days
love to see it - ongoing community effort makes deploying recurrent models (mamba, deltanet, other linear attention hybrids) easier than ever to realize their inference throughput wins
@PyTorch
PyTorch
4 days
Hybrid models like Qwen3-Next, Nemotron Nano 2 and Granite 4.0 are now fully supported in vLLM! Check out our latest blog from the vLLM team at IBM to learn how the vLLM community has elevated hybrid models from experimental hacks in V0 to first-class citizens in V1. πŸ”—
3
14
97
@SonglinYang4
Songlin Yang
4 days
Hybrid Models as First-Class Citizens in vLLM 😍
@PyTorch
PyTorch
4 days
Hybrid models like Qwen3-Next, Nemotron Nano 2 and Granite 4.0 are now fully supported in vLLM! Check out our latest blog from the vLLM team at IBM to learn how the vLLM community has elevated hybrid models from experimental hacks in V0 to first-class citizens in V1. πŸ”—
1
6
144
@thoefler
Torsten Hoefler πŸ‡¨πŸ‡­
5 days
Collaborator and friend Dan Alistarh talks at ETH about using the new NvFP4 and MXFP4 block formats for inference. Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks. https://t.co/4samDQeuGj Great collaboration and cool stuff
1
1
24