
Xiaoliu.x
@xiaolGo
Followers
89
Following
43
Media
13
Statuses
96
Exploring possibilities in large language model architectures,researcher @RWKV WIP https://t.co/QAxbgOpPX2
Joined January 2010
It's easy to say the inductive bias give hybrid models more expressive power. But we always face the bitter lesson, considering the test time scaling can fit anything, fortunately, we can always see and explore the internal world model across hybrids towards AGI.
Reka-Flash-3 Hybrid Preview is here🪿+🤖. 21.4B Parameters, 1/7 GQA Hybrid RWKV. 32k ctx NIAH(64k target).for light agentic-tasks,translation. やっと、まともに動くようになってきました😅.
0
0
1
RT @BlinkDL_AI: 🧵On Baselines in LLM Architecture Research, a Tale of DeltaNet and RWKV-7 (1). (full essay at http….
0
15
0
RT @BlinkDL_AI: RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it….
0
47
0
RT @xiaolGo: @TencentHunyuan Fun fact: RNN architectures have many alternative designs, but stacking layers in hybrid models often diminish….
0
1
0
Thank @huggingface for supporting us. Here, you can test the latest RWKV-7 G1 model (0.1B-2.9B xx%), which is the only RNN-based reasoning large language model in the world.
0
0
0
RT @leloykun: (Linear) Attention Mechanisms as Test-Time Regression. v1.1. I've added @BlinkDL_AI's RWKV-7 and fixed the update rule for Va….
0
32
0