Wu Haoning @HaoningTimothy X Profile

Wu Haoning

@HaoningTimothy

Followers

2K

Following

543

Media

53

Statuses

317

PhD Nanyang Technological University🇸🇬, BS @PKU1898, cooking VLMs in @Kimi_Moonshot. Opinions are personal.

https://t.co/PnBwnvid2k

Singapore

Joined December 2020

Don't wanna be here? Send us removal request.

Wu Haoning

@HaoningTimothy

7 months

https://t.co/HMBnndhTUe Kimi-VL’s technical report is out on arxiv!

3

1

43

Wu Haoning

@HaoningTimothy

4 days

open vlms don’t really chat well sad

0

11

Wu Haoning

@HaoningTimothy

8 days

Kopi o kosong?

stdrc

@istdrc

9 days

Introducing Kosong, the LLM abstraction layer powering Kimi CLI. It unifies message structures, asynchronous tool orchestration, and pluggable chat providers so you can build agents with ease and avoid vendor lock-in. GitHub: https://t.co/ZYorixix0C Docs:

0

1

6

Wu Haoning

@HaoningTimothy

8 days

Unlike world of code that relies on things that exist only 80 yrs, world of language with thousands of years, the visual world is there since the universe begins and we start to perceive with eyes. That is why we always find a VLM “defected”. 哀吾生之须臾、羡长江之无穷

2

6

47

Wu Haoning

@HaoningTimothy

11 days

A solid thinking model/

Kimi.ai

@Kimi_Moonshot

11 days

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built

1

14

Wu Haoning

@HaoningTimothy

11 days

Steps vs Turns, good lesson from 🐻 orz…

Xinyu Zhou

@zxytim

11 days

May god cherish your hard work...

0

1

Wu Haoning

@HaoningTimothy

11 days

K2, K2-Thinking, and …

🐻熊狸

@bigeagle_xd

11 days

what did you say? what are you thinking? what will you see?

11

1

81

Xingcheng Yao

@StuartYao22139

17 days

I'm grateful to be part of this moonshot spirit: Not because it's easy, but because it's hard.

Xinyu Zhou

@zxytim

18 days

You see: - a new arch that is better and faster than full attention verified with Kimi-style solidness. I see: - Starting with inferior performance even on short contexts. Nothing works and nobody knows why. - Tweaking every possible hyper-parameter to grasp what is wrong. -

0

2

16

Wu Haoning

@HaoningTimothy

12 days

Thinking never been so…

Lisan al Gaib

@scaling01

12 days

Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H

0

1

12

Lisan al Gaib

@scaling01

12 days

Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped https://t.co/tmZHPpCw3H

32

88

1K

Qwen

@Alibaba_Qwen

14 days

We’ve released an early preview of Qwen3-Max-Thinking—an intermediate checkpoint still in training. Even at this stage, when augmented with tool use and scaled test-time compute, it achieves 100% on challenging reasoning benchmarks like AIME 2025 and HMMT. You can try the

61

124

1K

Wu Haoning

@HaoningTimothy

21 days

This is why/what we are still working…

hsn

@hsn8086

22 days

https://t.co/Wq8MfoSyGQ

0

1

Li Junnan

@LiJunnan0409

28 days

Being able to decode all text from an image doesn’t mean the vision tokens captured every bit of textual information. The DeepSeek-3B-MoE decoder plays a big role — visual tokens likely encode high-entropy cues, while the decoder leverages its language prior to reconstruct text.

6

11

158

Wu Haoning

@HaoningTimothy

1 month

A saviour for GPU-poor PhD students🤯

Shuai Bai

@shuai_bai_

1 month

In just six months, 8B-scale VL small models are already matching the performance of last-gen 72B models on many benchmarks—absolutely amazing! Can't wait to see how you use them and hear your feedback! 🚀

0

4

Li Junnan

@LiJunnan0409

1 month

It’s been one year since we released Aria-25B-A3B. Looking back — Aria quietly set many firsts in open-source: - The first multimodal model with both strong VL and text understanding — now an industry standard. - The first fine-grained MoE multimodal model, proudly following

2

7

63

Wu Haoning

@HaoningTimothy

2 months

今晚XX-R1家族团建了

0

Wu Haoning

@HaoningTimothy

3 months

Plz open-source GPT-4o! (Nope just a joke but that was definitely the best CHATTY model on “not so difficult” problems)

WEQ.🌎⫸≬⫷ ⏩

@WealthEquation

3 months

🥹

0

2

Jinjie Ni

@NiJinjie

3 months

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens