TensorTemplar Profile Banner
Tensor Templar Profile
Tensor Templar

@TensorTemplar

Followers
313
Following
22K
Media
208
Statuses
8K

Chief Intellectrician (MLRE & EE), decoupling productivity from human labor with AI. Nuclear power / Sovereign Compute maximalist.

Joined February 2022
Don't wanna be here? Send us removal request.
@TensorTemplar
Tensor Templar
2 months
@rohanpaul_ai Here are some things you CANNOT do with closed models: - train SAE to find circuits / any other mechinterp - check weights for trimming potential - any kind of science requiring knowing the training data and if benchmarks are contaminated or not - any kind of inference speedups
2
0
9
@TensorTemplar
Tensor Templar
8 hours
0
0
1
@TensorTemplar
Tensor Templar
8 hours
So many options...
1
0
0
@TensorTemplar
Tensor Templar
8 hours
PoV: Normal Friday in ML research
2
0
0
@_xjdr
xjdr
2 days
today we’re open-sourcing nmoe: https://t.co/iq6HliUqpq i started this because training deepseek-shaped ultra-sparse moes should be straightforward at research scale, but in practice it’s painful: - expert flops get stranded (router shatters your batch → tiny per-expert
Tweet card summary image
github.com
MoE training for Me and You and maybe other people - GitHub - Noumena-Network/nmoe: MoE training for Me and You and maybe other people
24
71
583
@TensorTemplar
Tensor Templar
6 days
@Alibaba_Qwen
Qwen
23 days
🏆 We are incredibly honored to announce that our paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" has received the NeurIPS 2025 Best Paper Award! A huge congratulations to our dedicated research team for pushing the boundaries
0
0
0
@TensorTemplar
Tensor Templar
6 days
@karpathy Already knew from aux-loss-free sigmoid routers that fp8 wouldn't work, but tried it anyway - and yes, it doesn't.
0
0
0
@TensorTemplar
Tensor Templar
6 days
So that gated attention paper is pretty cool. I implemented it for dist muon and fp8 tensorwise in @karpathy's nanochat, while on the plane, and it is really converging quicker, despite losing some mfu and ~4k tps. left - fp8 tensorwise baseline, right - gated, see step 18
3
0
2
@TensorTemplar
Tensor Templar
9 days
Chat is this legit?
0
0
0
@Teknium
Teknium (e/λ)
12 days
We are hiring highly skilled front end/ux developers, MLE's focused on RL, Training Infrastructure, MLOps, and Pretraining, and some other positions! https://t.co/0CTL25MFs9
nousresearch.com
CAREERS OUR MISSION is to create and democratize access to the world’s best intelligence. Powerful models should be in the hands of the many rather than the privileged few. To get there, we’ve...
@NewsWire_US
NewsWire
13 days
U.S. LAYOFFS ARE ON TRACK TO EXCEED GREAT FINANCIAL CRISIS LEVELS
26
20
375
@TensorTemplar
Tensor Templar
13 days
PoV: you got too drunk after NeurIPS and fell into a safety paper
@voooooogel
thebes
13 days
the shoggoth metaphor fails to convey that a sufficiently powerful and integrated mask can reach back and steer the simulator that hosts it. your brain can host multiple voices - you can imagine a character, have a conversation with them, etc. for some people, those voices can
0
0
0
@TensorTemplar
Tensor Templar
14 days
I would be suprised if we didn't find "zero-retention" logs are a part of the dump as well. First thing to do with all that "de-identified" data is re-identify it and share a torrent link. Agents read .env and decrypted secrets routinely, so ya'll can start rotating those
@AdamEisgrau
Adam Eisgrau
16 days
BREAKING: @OpenAI must tuner over 20 million+ chat logs to plaintiffs, Judge Ona Wang has ruled in a 9-pg Order just issued:
0
0
1
@TensorTemplar
Tensor Templar
15 days
Anthropic is still ghosting me, should i have emphasized i can write code which also fixes pre-existing errors?
@TensorTemplar
Tensor Templar
15 days
@AnthropicAI Will it offer me a job if i do well? No phd, but can write codes without so much as a single accidental markdown file
0
0
1
@anttivesala
Anttї Vesala 🇺🇦🌻🎗️
16 days
Paikalla on pidetty esillä täysin terroristista ja joukkotuhontaan yllyttävää iskulausetta "joelta merelle". Hurmahenkisen akateemisen vasemmistolaisuuden ilmenemismuodot saavat vuosikymmenestä toiseen aina vain irvokkaampia ja kuvottavampia piirteitä. https://t.co/jd7ZQ4tdMG
Tweet card summary image
hs.fi
Helsingin yliopiston päärakennuksessa oli keskiviikkona satojen opiskelijoiden ja tutkijoiden mielenilmaus.
8
29
326
@TensorTemplar
Tensor Templar
16 days
@BenjaminDEKR Need to put "cant click ads when chatting" to my list of downsides of open models
@TensorTemplar
Tensor Templar
2 months
@rohanpaul_ai Here are some things you CANNOT do with closed models: - train SAE to find circuits / any other mechinterp - check weights for trimming potential - any kind of science requiring knowing the training data and if benchmarks are contaminated or not - any kind of inference speedups
0
1
0
@__tinygrad__
the tiny corp
16 days
We got sick of using vendor tools for bandwidth tests, so we wrote a universal one in tinygrad. The GPUs are connected at full PCIe 5.0 x16
2
1
31
@latkins
Lucas Atkins
18 days
Today, we are introducing Trinity, the start of an open-weight MoE family that businesses and developers can own. Trinity-Mini (26B-A3B) Trinity-Nano-Preview (6B-A1B) Available Today on Huggingface.
84
153
1K
@TensorTemplar
Tensor Templar
21 days
As i paused my math book review to complain to my wife about how the AI on my phone sometimes couldn't read the handwriting during live chat, if i hold the book with one hand or shade it she just giggled and ignored me. I had been on the couch, debating a book live with an AI
0
0
1
@TensorTemplar
Tensor Templar
21 days
Ah my bad, there is still a bit of space for larger batches
0
0
0
@TensorTemplar
Tensor Templar
21 days
For someone new to nanochat performance hacking, is 70-80% MFU on a 1.2B any good?
1
0
0
@TensorTemplar
Tensor Templar
22 days
So you're saying multiple screens is way more efficient? I am NGMI with my ultrawide? Oh, this dude? Don't worry about him.
0
0
0