Xiaoliu.x @xiaolGo X Profile

Xiaoliu.x

@xiaolGo

Followers

89

Following

43

Media

13

Statuses

96

Exploring possibilities in large language model architectures,researcher @RWKV WIP https://t.co/QAxbgOpPX2

Joined January 2010

Don't wanna be here? Send us removal request.

Xiaoliu.x

@xiaolGo

14 days

It's easy to say the inductive bias give hybrid models more expressive power. But we always face the bitter lesson, considering the test time scaling can fit anything, fortunately, we can always see and explore the internal world model across hybrids towards AGI.

OpenMOSE

@_m0se_

14 days

Reka-Flash-3 Hybrid Preview is here🪿+🤖. 21.4B Parameters, 1/7 GQA Hybrid RWKV. 32k ctx NIAH(64k target).for light agentic-tasks,translation. やっと、まともに動くようになってきました😅.

0

1

Xiaoliu.x

@xiaolGo

18 days

Would see more and more powerful small models coming.

BlinkDL

@BlinkDL_AI

19 days

RWKV-8 "Heron" preview (2) - DeepEmbedAttention (DEA), particularly suitable for hybrid models (1/9 KV cache size of MLA). The goal of RWKV-8 is to achieve longctx with 0 KV cache, and I have some progress too🙂

0

Xiaoliu.x

@xiaolGo

23 days

RT @_m0se_: HRWKV7-hxa079-Qwen3-14B. Early版ですが、だいぶしゃべるようになったので、.リリースしました。. L40D5120で、6層だけGQA Attentionにしています. 面白いことに、GQA部分にRopeがないほうが.安定しまし….

0

1

0

Xiaoliu.x

@xiaolGo

1 month

This is how i read more than 500 papers in half a year.

0

4

Xiaoliu.x

@xiaolGo

1 month

RT @BlinkDL_AI: @nymfree Agree. Please use as reference, and the community is building

0

1

0

Xiaoliu.x

@xiaolGo

1 month

RT @BlinkDL_AI: 🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1). (full essay at http….

0

15

0

Xiaoliu.x

@xiaolGo

2 months

More insight from hallucination?

0

1

Xiaoliu.x

@xiaolGo

2 months

RT @BlinkDL_AI: RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it….

0

47

0

Xiaoliu.x

@xiaolGo

3 months

RWKV + MOBA = RWKV-X:ALinear Complexity Hybrid Language Model.

0

Xiaoliu.x

@xiaolGo

3 months

Why not?.

0

Xiaoliu.x

@xiaolGo

3 months

Welcome feedback.

0

Xiaoliu.x

@xiaolGo

3 months

RT @xiaolGo: @TencentHunyuan Fun fact: RNN architectures have many alternative designs, but stacking layers in hybrid models often diminish….

0

1

0

Xiaoliu.x

@xiaolGo

3 months

0

Xiaoliu.x

@xiaolGo

3 months

Why not use Hymba's architecture? Nemotron-H's paper introduces a layer-stack method for a hybrid transformer-state model. IMO, it misses the expressive power of 3rd-gen state-based models. See Titans & RWKV-7 papers for more. Thoughts? #AI

1

0

Xiaoliu.x

@xiaolGo

4 months

RT @susumuota: Top 30 most popular arXiv papers in the last 30 days

0

2

0

Xiaoliu.x

@xiaolGo

4 months

Thank @huggingface for supporting us. Here, you can test the latest RWKV-7 G1 model (0.1B-2.9B xx%), which is the only RNN-based reasoning large language model in the world.

0

Xiaoliu.x

@xiaolGo

4 months

0

Xiaoliu.x

@xiaolGo

4 months

WIP, .BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling. up to 43.xx performence boost from previous Timer(ICML2025)

1

0

Xiaoliu.x

@xiaolGo

5 months

Large Language Diffusion Models. It's very delightful to read this paper,always thinking how to make diffusion adaptive to language,AHA,this mask is a masterpiece work!. Looking forward to the code release.

1

0

1

Xiaoliu.x

@xiaolGo

5 months

RT @leloykun: (Linear) Attention Mechanisms as Test-Time Regression. v1.1. I've added @BlinkDL_AI's RWKV-7 and fixed the update rule for Va….

0

32

0