xiaolGo Profile Banner
Xiaoliu.x Profile
Xiaoliu.x

@xiaolGo

Followers
89
Following
43
Media
13
Statuses
96

Exploring possibilities in large language model architectures,researcher @RWKV WIP https://t.co/QAxbgOpPX2

Joined January 2010
Don't wanna be here? Send us removal request.
@xiaolGo
Xiaoliu.x
14 days
It's easy to say the inductive bias give hybrid models more expressive power. But we always face the bitter lesson, considering the test time scaling can fit anything, fortunately, we can always see and explore the internal world model across hybrids towards AGI.
@_m0se_
OpenMOSE
14 days
Reka-Flash-3 Hybrid Preview is here🪿+🤖. 21.4B Parameters, 1/7 GQA Hybrid RWKV. 32k ctx NIAH(64k target).for light agentic-tasks,translation. やっと、まともに動くようになってきました😅.
0
0
1
@xiaolGo
Xiaoliu.x
18 days
Would see more and more powerful small models coming.
@BlinkDL_AI
BlinkDL
19 days
RWKV-8 "Heron" preview (2) - DeepEmbedAttention (DEA), particularly suitable for hybrid models (1/9 KV cache size of MLA). The goal of RWKV-8 is to achieve longctx with 0 KV cache, and I have some progress too🙂
Tweet media one
0
0
0
@xiaolGo
Xiaoliu.x
23 days
RT @_m0se_: HRWKV7-hxa079-Qwen3-14B. Early版ですが、だいぶしゃべるようになったので、.リリースしました。. L40D5120で、6層だけGQA Attentionにしています. 面白いことに、GQA部分にRopeがないほうが.安定しまし….
0
1
0
@xiaolGo
Xiaoliu.x
1 month
This is how i read more than 500 papers in half a year.
Tweet media one
0
0
4
@xiaolGo
Xiaoliu.x
1 month
RT @BlinkDL_AI: @nymfree Agree. Please use as reference, and the community is building
0
1
0
@xiaolGo
Xiaoliu.x
1 month
RT @BlinkDL_AI: 🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1). (full essay at http….
0
15
0
@xiaolGo
Xiaoliu.x
2 months
More insight from hallucination?
0
0
1
@xiaolGo
Xiaoliu.x
2 months
RT @BlinkDL_AI: RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it….
0
47
0
@xiaolGo
Xiaoliu.x
3 months
RWKV + MOBA = RWKV-X:ALinear Complexity Hybrid Language Model.
0
0
0
@xiaolGo
Xiaoliu.x
3 months
Why not?.
0
0
0
@xiaolGo
Xiaoliu.x
3 months
Welcome feedback.
0
0
0
@xiaolGo
Xiaoliu.x
3 months
RT @xiaolGo: @TencentHunyuan Fun fact: RNN architectures have many alternative designs, but stacking layers in hybrid models often diminish….
0
1
0
@xiaolGo
Xiaoliu.x
3 months
0
0
0
@xiaolGo
Xiaoliu.x
3 months
Why not use Hymba's architecture? Nemotron-H's paper introduces a layer-stack method for a hybrid transformer-state model. IMO, it misses the expressive power of 3rd-gen state-based models. See Titans & RWKV-7 papers for more. Thoughts? #AI
Tweet media one
Tweet media two
1
0
0
@xiaolGo
Xiaoliu.x
4 months
RT @susumuota: Top 30 most popular arXiv papers in the last 30 days
Tweet media one
0
2
0
@xiaolGo
Xiaoliu.x
4 months
Thank @huggingface for supporting us. Here, you can test the latest RWKV-7 G1 model (0.1B-2.9B xx%), which is the only RNN-based reasoning large language model in the world.
0
0
0
@xiaolGo
Xiaoliu.x
4 months
0
0
0
@xiaolGo
Xiaoliu.x
4 months
WIP, .BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling. up to 43.xx performence boost from previous Timer(ICML2025)
Tweet media one
1
0
0
@xiaolGo
Xiaoliu.x
5 months
Large Language Diffusion Models. It's very delightful to read this paper,always thinking how to make diffusion adaptive to language,AHA,this mask is a masterpiece work!. Looking forward to the code release.
1
0
1
@xiaolGo
Xiaoliu.x
5 months
RT @leloykun: (Linear) Attention Mechanisms as Test-Time Regression. v1.1. I've added @BlinkDL_AI's RWKV-7 and fixed the update rule for Va….
0
32
0