BlinkDL
@BlinkDL_AI
Followers
9K
Following
413
Media
192
Statuses
476
RWKV = 100% RNN with GPT-level performance. https://t.co/TkdxOJSFWX and https://t.co/86DzS6arA0
Joined September 2022
RWKV-7 G0a3 13.3B: strongest pure RNN ever, MMLU 76.0% (+CoT=82.5%), MATH500 76.0%, GSM8K 92.3%, MMLU Pro 49.8% (+CoT=61.6%), no eval-maxxing / mid-training / post-training. Download: https://t.co/oJxpuJoeQD Ollama: https://t.co/bScaOJSoD0
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon ð
0
3
48
10250+ token/s RWKV-7 7.2B fp16 bsz960 @ RTX5090 123+ token/s RWKV-7 7.2B fp16 bsz1 @ RTX5090 https://t.co/YW3XbVuuCP always const speed & vram because we are RNN ð
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon ð
1
5
38
Now 4 community ROSA ðđ projects ðĨ https://t.co/QwXDJCfRPz
https://t.co/H8lgDYi54T
https://t.co/uTvtyxU6A8
github.com
Contribute to x-0D/RASP development by creating an account on GitHub.
RWKV7+ROSA 1M params solving 40 digits +/- with 99% digit accuracy, without CoT ðđ demo: https://t.co/j0eFQDISvu
0
3
26
How RWKV-7 models evolve by adding better data ð
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon ð
1
2
40
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon ð
RWKV-7 G1a 2.9B : pure RNN surpassing Gemma3 4B and Llama3.2 3B in some areas, supports two reasoning styles and length control. Download: https://t.co/oJxpuJoeQD ð and G1a 1.5/0.4/0.1B & G0a 7B updated, G0 13B release in Oct
3
10
60
RWKV7+ROSA 1M params solving 40 digits +/- with 99% digit accuracy, without CoT ðđ demo: https://t.co/j0eFQDISvu
RWKV7 vs RWKV7+ROSAv251020 vs RWKV7 + ROSAv251021 (same arch¶ms as v251020, better training method) ð
7
19
141
Training finished. Solid improvements, though unstable (will fix). This is learning to + and - large random numbers (not using loss mask, so the loss will appear higher).
1
1
13
now i think i should try matching seqA with seqB to avoiding "matching matching" (very complicated behavior ð ) and of course one can match seqQ with seqK to fetch seqV
RWKV8 ROSA ðđ simply scales, producing mysterious new languages. Training small LMs soon ð Code: https://t.co/j0eFQDJql2
0
2
33
RWKV8 ROSA ðđ simply scales, producing mysterious new languages. Training small LMs soon ð Code: https://t.co/j0eFQDJql2
LM inventing inner monologue languages âĻ enabled by RWKV8 multi-layer ROSA ðđ via fully end-to-end training (next-token prediction) ð Code: https://t.co/ktMBpi2kfI
5
10
102
LM inventing inner monologue languages âĻ enabled by RWKV8 multi-layer ROSA ðđ via fully end-to-end training (next-token prediction) ð Code: https://t.co/ktMBpi2kfI
1
4
60
RWKV8 ROSA training demo - the first serious neurosymbolic LM? for a new era in AI ð Code: https://t.co/j0eFQDISvu
RWKV-8 ROSA ðđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI ð
0
16
94
A working trainable ROSA layer using the 1bit + "local" gradient idea here ð
github.com
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it'...
0
5
24
RWKV-8 ROSA ðđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI ð
The new mechanism in RWKV-8 "Heron" ðŠķ is named ROSA (acronym, note SA â Self-Attention here) ðđ ROSA is compromise-free: we get efficient, scalable, genuine infinite ctx, by applying some beautiful algorithms.
16
56
355
By "everything" I mean reasoning/instruction/chat data, not test set ð
0
0
9
RWKV-7 G1a 2.9B more evals: https://t.co/X2R2f6EeRB MMLU Pro 42% (+CoT), GSM8K 77%, MATH 50%. Note this is a base model, no mid-training, no post-training. I just add everything to pretraining dataset.
RWKV-7 G1a 2.9B : pure RNN surpassing Gemma3 4B and Llama3.2 3B in some areas, supports two reasoning styles and length control. Download: https://t.co/oJxpuJoeQD ð and G1a 1.5/0.4/0.1B & G0a 7B updated, G0 13B release in Oct
2
3
45