BlinkDL_AI Profile Banner
BlinkDL Profile
BlinkDL

@BlinkDL_AI

Followers
9K
Following
413
Media
192
Statuses
476

RWKV = 100% RNN with GPT-level performance. https://t.co/TkdxOJSFWX and https://t.co/86DzS6arA0

Joined September 2022
Don't wanna be here? Send us removal request.
@BlinkDL_AI
BlinkDL
3 days
RWKV-7 G0a3 13.3B: strongest pure RNN ever, MMLU 76.0% (+CoT=82.5%), MATH500 76.0%, GSM8K 92.3%, MMLU Pro 49.8% (+CoT=61.6%), no eval-maxxing / mid-training / post-training. Download: https://t.co/oJxpuJoeQD Ollama: https://t.co/bScaOJSoD0
@BlinkDL_AI
BlinkDL
7 days
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon 🙂
0
3
48
@BlinkDL_AI
BlinkDL
3 days
10250+ token/s RWKV-7 7.2B fp16 bsz960 @ RTX5090 123+ token/s RWKV-7 7.2B fp16 bsz1 @ RTX5090 https://t.co/YW3XbVuuCP always const speed & vram because we are RNN 🚀
@BlinkDL_AI
BlinkDL
7 days
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon 🙂
1
5
38
@BlinkDL_AI
BlinkDL
3 days
Tweet card summary image
github.com
Contribute to x-0D/RASP development by creating an account on GitHub.
@BlinkDL_AI
BlinkDL
10 days
RWKV7+ROSA 1M params solving 40 digits +/- with 99% digit accuracy, without CoT ðŸŒđ demo: https://t.co/j0eFQDISvu
0
3
26
@BlinkDL_AI
BlinkDL
6 days
How RWKV-7 models evolve by adding better data 🙂
@BlinkDL_AI
BlinkDL
7 days
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon 🙂
1
2
40
@BlinkDL_AI
BlinkDL
7 days
Ollama GGUF: https://t.co/bScaOJSoD0 Gradio Demo:
Tweet card summary image
huggingface.co
0
0
6
@BlinkDL_AI
BlinkDL
7 days
RWKV-7 G0a3 7.2B: pure RNN with MMLU 65.0% (+CoT=72.3%), MATH500 67.8%, GSM8K 83.9%, MMLU Pro 35.9% (+CoT=52.1%) and no eval-maxxing, no mid-training, no post-training. Download: https://t.co/oJxpuJoeQD and G0a3 13.3B release very soon 🙂
@BlinkDL_AI
BlinkDL
1 month
RWKV-7 G1a 2.9B : pure RNN surpassing Gemma3 4B and Llama3.2 3B in some areas, supports two reasoning styles and length control. Download: https://t.co/oJxpuJoeQD 🚀 and G1a 1.5/0.4/0.1B & G0a 7B updated, G0 13B release in Oct
3
10
60
@BlinkDL_AI
BlinkDL
10 days
RWKV7+ROSA 1M params solving 40 digits +/- with 99% digit accuracy, without CoT ðŸŒđ demo: https://t.co/j0eFQDISvu
@BlinkDL_AI
BlinkDL
16 days
RWKV7 vs RWKV7+ROSAv251020 vs RWKV7 + ROSAv251021 (same arch&params as v251020, better training method) 🚀
7
19
141
@BlinkDL_AI
BlinkDL
16 days
RWKV7 vs RWKV7+ROSAv251020 vs RWKV7 + ROSAv251021 (same arch&params as v251020, better training method) 🚀
@BlinkDL_AI
BlinkDL
18 days
Training finished. Solid improvements, though unstable (will fix). This is learning to + and - large random numbers (not using loss mask, so the loss will appear higher).
6
7
72
@BlinkDL_AI
BlinkDL
18 days
Training finished. Solid improvements, though unstable (will fix). This is learning to + and - large random numbers (not using loss mask, so the loss will appear higher).
1
1
13
@BlinkDL_AI
BlinkDL
18 days
Confirmed RWKV7+ROSA > RWKV7 and ROSA-grok spottedðŸĪŊ
@BlinkDL_AI
BlinkDL
20 days
now i think i should try matching seqA with seqB to avoiding "matching matching" (very complicated behavior 😂 ) and of course one can match seqQ with seqK to fetch seqV
4
3
63
@BlinkDL_AI
BlinkDL
20 days
now i think i should try matching seqA with seqB to avoiding "matching matching" (very complicated behavior 😂 ) and of course one can match seqQ with seqK to fetch seqV
@BlinkDL_AI
BlinkDL
20 days
RWKV8 ROSA ðŸŒđ simply scales, producing mysterious new languages. Training small LMs soon 🙂 Code: https://t.co/j0eFQDJql2
0
2
33
@BlinkDL_AI
BlinkDL
20 days
RWKV8 ROSA ðŸŒđ simply scales, producing mysterious new languages. Training small LMs soon 🙂 Code: https://t.co/j0eFQDJql2
@BlinkDL_AI
BlinkDL
21 days
LM inventing inner monologue languages âœĻ enabled by RWKV8 multi-layer ROSA ðŸŒđ via fully end-to-end training (next-token prediction) 🚀 Code: https://t.co/ktMBpi2kfI
5
10
102
@BlinkDL_AI
BlinkDL
21 days
RWKV-8 ROSA Introduction:
@BlinkDL_AI
BlinkDL
26 days
RWKV-8 ROSA ðŸŒđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌
0
0
5
@BlinkDL_AI
BlinkDL
21 days
LM inventing inner monologue languages âœĻ enabled by RWKV8 multi-layer ROSA ðŸŒđ via fully end-to-end training (next-token prediction) 🚀 Code: https://t.co/ktMBpi2kfI
1
4
60
@BlinkDL_AI
BlinkDL
22 days
RWKV8 ROSA training demo - the first serious neurosymbolic LM? for a new era in AI 🌌 Code: https://t.co/j0eFQDISvu
@BlinkDL_AI
BlinkDL
26 days
RWKV-8 ROSA ðŸŒđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌
0
16
94
@BlinkDL_AI
BlinkDL
22 days
A working trainable ROSA layer using the 1bit + "local" gradient idea here 🙂
Tweet card summary image
github.com
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it'...
@BlinkDL_AI
BlinkDL
24 days
RWKV-8 ROSA: How to Train It? (Oct 13, 2025)
0
5
24
@BlinkDL_AI
BlinkDL
24 days
RWKV-8 ROSA: How to Train It? (Oct 13, 2025)
@BlinkDL_AI
BlinkDL
26 days
RWKV-8 ROSA ðŸŒđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌
0
8
55
@BlinkDL_AI
BlinkDL
26 days
RWKV-8 ROSA ðŸŒđ mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌
@BlinkDL_AI
BlinkDL
29 days
The new mechanism in RWKV-8 "Heron" ðŸŠķ is named ROSA (acronym, note SA ≠ Self-Attention here) ðŸŒđ ROSA is compromise-free: we get efficient, scalable, genuine infinite ctx, by applying some beautiful algorithms.
16
56
355
@BlinkDL_AI
BlinkDL
28 days
By "everything" I mean reasoning/instruction/chat data, not test set 😂
0
0
9
@BlinkDL_AI
BlinkDL
28 days
RWKV-7 G1a 2.9B more evals: https://t.co/X2R2f6EeRB MMLU Pro 42% (+CoT), GSM8K 77%, MATH 50%. Note this is a base model, no mid-training, no post-training. I just add everything to pretraining dataset.
@BlinkDL_AI
BlinkDL
1 month
RWKV-7 G1a 2.9B : pure RNN surpassing Gemma3 4B and Llama3.2 3B in some areas, supports two reasoning styles and length control. Download: https://t.co/oJxpuJoeQD 🚀 and G1a 1.5/0.4/0.1B & G0a 7B updated, G0 13B release in Oct
2
3
45