
N8 Programs
@N8Programs
Followers
7K
Following
3K
Media
394
Statuses
4K
Studying Applied Mathematics and Statistics at @JohnsHopkins. Currently interning at @RockefellerUniv.
Proxima Centauri B
Joined September 2022
the degree of composability an LLM has is far, far, less than what a human works with. but far greater than zero.
chatgpt, claude, gemini, grok, etc have all read, comprehended, and nearly memorized every book in the world, and yet with current architectures and training techniques none of them have any truly novel knowledge to give us. really makes you think.
0
0
2
RT @allen_ai: Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions.….
0
50
0
RT @Duderichy: Alan Turing was a world class runner:. "While working at Bletchley, Turing, a talented long-distance runner, occasionally ra….
0
397
0
this is an intriguing tweet - opus 3's architecture being considered a trade secret could mean either:. - there's some special sauce.- it's just a vanilla transformer and anthropic wants to preserve the image of special sauce.- or opus is served at ridiculous margins.
@jik_wtf Unfortunately Opus 3 is not so old a model that we're comfortable sharing its architecture publicly right now. Speaking in a personal capacity, I will advocate in 5+ years for it to be released :).
2
0
16
RT @willccbb: WOW! 🤯 this groundbreaking dataset from Meta’s Chief AI Scientist has revolutionized the way that we understand vision 👀 🚀 is….
0
47
0
these are crazy numbers for a 13B active w/ only 80B.
I've been saying that this shape is underrated. Qwen bros didn't do that so Tencent bros picked up the slack. 80B total, 13B active, 256K context, 71.2 GPQA-Diamond, pretty good quantization to FP8 and even INT4. Might be «DeepSeek-Medium» for those interested.
0
0
2
bro is the alexander hamilton of model implementations. how does he code like he's running out of time.
Last 2 weeks: . > Gemma3n.> Phi4mm vision working, now audio and a few optimisations missing .> Falcon H1 (Mamba + Transformers).> Bitnet metal kernel 90% faster on MLX compared to official Bitnet.cpp .> Falcon Bitnet .> Processed 34m samples and training a new secret model .>.
1
1
9
RT @Azaliamirh: Introducing Weaver, a test time scaling method for verification! . Weaver shrinks the generation-verification gap through a….
0
47
0
RT @SunshineFiora: it seems like it would be extremely good for the alignment community to run public experiments in post-training open sou….
0
8
0