Wu Haoning Profile
Wu Haoning

@HaoningTimothy

Followers
2K
Following
543
Media
53
Statuses
317

PhD Nanyang Technological University🇸🇬, BS @PKU1898, cooking VLMs in @Kimi_Moonshot. Opinions are personal.

Singapore
Joined December 2020
Don't wanna be here? Send us removal request.
@HaoningTimothy
Wu Haoning
7 months
https://t.co/HMBnndhTUe Kimi-VL’s technical report is out on arxiv!
3
1
43
@HaoningTimothy
Wu Haoning
4 days
open vlms don’t really chat well sad
0
0
11
@HaoningTimothy
Wu Haoning
8 days
Kopi o kosong?
@istdrc
stdrc
9 days
Introducing Kosong, the LLM abstraction layer powering Kimi CLI. It unifies message structures, asynchronous tool orchestration, and pluggable chat providers so you can build agents with ease and avoid vendor lock-in. GitHub: https://t.co/ZYorixix0C Docs:
0
1
6
@HaoningTimothy
Wu Haoning
8 days
Unlike world of code that relies on things that exist only 80 yrs, world of language with thousands of years, the visual world is there since the universe begins and we start to perceive with eyes. That is why we always find a VLM “defected”. 哀吾生之须臾、羡长江之无穷
2
6
47
@HaoningTimothy
Wu Haoning
11 days
A solid thinking model/
@Kimi_Moonshot
Kimi.ai
11 days
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
1
1
14
@HaoningTimothy
Wu Haoning
11 days
Steps vs Turns, good lesson from 🐻 orz…
@zxytim
Xinyu Zhou
11 days
May god cherish your hard work...
0
0
1
@HaoningTimothy
Wu Haoning
11 days
K2, K2-Thinking, and …
@bigeagle_xd
🐻熊狸
11 days
what did you say? what are you thinking? what will you see?
11
1
81
@StuartYao22139
Xingcheng Yao
17 days
I'm grateful to be part of this moonshot spirit: Not because it's easy, but because it's hard.
@zxytim
Xinyu Zhou
18 days
You see: - a new arch that is better and faster than full attention verified with Kimi-style solidness. I see: - Starting with inferior performance even on short contexts. Nothing works and nobody knows why. - Tweaking every possible hyper-parameter to grasp what is wrong. -
0
2
16
@HaoningTimothy
Wu Haoning
12 days
Thinking never been so…
@scaling01
Lisan al Gaib
12 days
Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H
0
1
12
@scaling01
Lisan al Gaib
12 days
Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H
32
88
1K
@Alibaba_Qwen
Qwen
14 days
We’ve released an early preview of Qwen3-Max-Thinking—an intermediate checkpoint still in training. Even at this stage, when augmented with tool use and scaled test-time compute, it achieves 100% on challenging reasoning benchmarks like AIME 2025 and HMMT. You can try the
61
124
1K
@HaoningTimothy
Wu Haoning
21 days
This is why/what we are still working…
0
0
1
@LiJunnan0409
Li Junnan
28 days
Being able to decode all text from an image doesn’t mean the vision tokens captured every bit of textual information. The DeepSeek-3B-MoE decoder plays a big role — visual tokens likely encode high-entropy cues, while the decoder leverages its language prior to reconstruct text.
6
11
158
@HaoningTimothy
Wu Haoning
1 month
A saviour for GPU-poor PhD students🤯
@shuai_bai_
Shuai Bai
1 month
In just six months, 8B-scale VL small models are already matching the performance of last-gen 72B models on many benchmarks—absolutely amazing! Can't wait to see how you use them and hear your feedback! 🚀
0
0
4
@LiJunnan0409
Li Junnan
1 month
It’s been one year since we released Aria-25B-A3B. Looking back — Aria quietly set many firsts in open-source: - The first multimodal model with both strong VL and text understanding — now an industry standard. - The first fine-grained MoE multimodal model, proudly following
2
7
63
@HaoningTimothy
Wu Haoning
2 months
今晚XX-R1家族团建了
0
0
0
@HaoningTimothy
Wu Haoning
3 months
Plz open-source GPT-4o! (Nope just a joke but that was definitely the best CHATTY model on “not so difficult” problems)
@WealthEquation
WEQ.🌎⫸≬⫷ ⏩
3 months
🥹
0
0
2
@NiJinjie
Jinjie Ni
3 months
Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens
42
247
2K
@HaoningTimothy
Wu Haoning
3 months
Guess we are the 80% lol
0
0
1
@HaoningTimothy
Wu Haoning
3 months
We have been always evaluating VideoMMMU for video understanding.
@BoLi68567011
Brian Bo Li
3 months
Good work would be recognized, despite being not accepted by CVPR, ICCV🤣
0
3
19
@bigeagle_xd
🐻熊狸
4 months
这家媒体,集标题党、数据错误、颠倒黑白、不给钱就绝不说名字于一身,也是找不出合适的词来骂了
5
2
36