Wu Haoning
@HaoningTimothy
Followers
2K
Following
543
Media
53
Statuses
317
PhD Nanyang Technological University🇸🇬, BS @PKU1898, cooking VLMs in @Kimi_Moonshot. Opinions are personal.
Singapore
Joined December 2020
Kopi o kosong?
Introducing Kosong, the LLM abstraction layer powering Kimi CLI. It unifies message structures, asynchronous tool orchestration, and pluggable chat providers so you can build agents with ease and avoid vendor lock-in. GitHub: https://t.co/ZYorixix0C Docs:
0
1
6
Unlike world of code that relies on things that exist only 80 yrs, world of language with thousands of years, the visual world is there since the universe begins and we start to perceive with eyes. That is why we always find a VLM “defected”. 哀吾生之须臾、羡长江之无穷
2
6
47
A solid thinking model/
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
1
1
14
I'm grateful to be part of this moonshot spirit: Not because it's easy, but because it's hard.
You see: - a new arch that is better and faster than full attention verified with Kimi-style solidness. I see: - Starting with inferior performance even on short contexts. Nothing works and nobody knows why. - Tweaking every possible hyper-parameter to grasp what is wrong. -
0
2
16
Thinking never been so…
Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H
0
1
12
Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H
32
88
1K
We’ve released an early preview of Qwen3-Max-Thinking—an intermediate checkpoint still in training. Even at this stage, when augmented with tool use and scaled test-time compute, it achieves 100% on challenging reasoning benchmarks like AIME 2025 and HMMT. You can try the
61
124
1K
This is why/what we are still working…
0
0
1
Being able to decode all text from an image doesn’t mean the vision tokens captured every bit of textual information. The DeepSeek-3B-MoE decoder plays a big role — visual tokens likely encode high-entropy cues, while the decoder leverages its language prior to reconstruct text.
6
11
158
It’s been one year since we released Aria-25B-A3B. Looking back — Aria quietly set many firsts in open-source: - The first multimodal model with both strong VL and text understanding — now an industry standard. - The first fine-grained MoE multimodal model, proudly following
2
7
63
Plz open-source GPT-4o! (Nope just a joke but that was definitely the best CHATTY model on “not so difficult” problems)
0
0
2
Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens
42
247
2K
Guess we are the 80% lol
0
0
1