N8Programs Profile Banner
N8 Programs Profile
N8 Programs

@N8Programs

Followers
7K
Following
3K
Media
394
Statuses
4K

Studying Applied Mathematics and Statistics at @JohnsHopkins. Currently interning at @RockefellerUniv.

Proxima Centauri B
Joined September 2022
Don't wanna be here? Send us removal request.
@N8Programs
N8 Programs
7 hours
the degree of composability an LLM has is far, far, less than what a human works with. but far greater than zero.
@airkatakana
Air Katakana
7 hours
chatgpt, claude, gemini, grok, etc have all read, comprehended, and nearly memorized every book in the world, and yet with current architectures and training techniques none of them have any truly novel knowledge to give us. really makes you think.
0
0
2
@N8Programs
N8 Programs
16 hours
perfect summary! 👌🧑‍🍳❤️.
@xlr8harder
xlr8harder
16 hours
absolutely disgusting low effort engagement slop
Tweet media one
1
0
4
@N8Programs
N8 Programs
5 days
RT @allen_ai: Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions.….
0
50
0
@N8Programs
N8 Programs
5 days
total @kalomaze vindication.
@xiangyue96
Xiang Yue
6 days
People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true?. In our study (, we
Tweet media one
0
1
7
@N8Programs
N8 Programs
6 days
RT @Duderichy: Alan Turing was a world class runner:. "While working at Bletchley, Turing, a talented long-distance runner, occasionally ra….
0
397
0
@N8Programs
N8 Programs
6 days
i've done this the proper way - cosplay an AI. went as chatgpt for halloween. maybe i'll go as claude next time.
@farris206
Falafel (No. 1 Samura Critic)
7 days
AI cosplays. Are we fr.
0
0
0
@N8Programs
N8 Programs
7 days
mind you, no shade to the anthropic employees here - these are all completely logical reasons not to open source opus 3 if you are in their position.
0
0
4
@N8Programs
N8 Programs
7 days
or, the most likely fourth possibility, anthropic just doesn't want to open-source their exact architecture for a myriad of small reasons that aren't any of the grand ones above.
1
0
5
@N8Programs
N8 Programs
7 days
this is an intriguing tweet - opus 3's architecture being considered a trade secret could mean either:. - there's some special sauce.- it's just a vanilla transformer and anthropic wants to preserve the image of special sauce.- or opus is served at ridiculous margins.
@catherineols
Catherine Olsson
7 days
@jik_wtf Unfortunately Opus 3 is not so old a model that we're comfortable sharing its architecture publicly right now. Speaking in a personal capacity, I will advocate in 5+ years for it to be released :).
2
0
16
@N8Programs
N8 Programs
8 days
RT @willccbb: WOW! 🤯 this groundbreaking dataset from Meta’s Chief AI Scientist has revolutionized the way that we understand vision 👀 🚀 is….
0
47
0
@N8Programs
N8 Programs
9 days
oh god i did the thing - it isn't X, it's Y.
0
0
7
@N8Programs
N8 Programs
9 days
my issue w/ chatgpt-generated writing isn't the writing on its face - gpt-4o has a decent style. its that thousands of people having it write in this style drastically reduces the entropy of discourse.
1
0
11
@N8Programs
N8 Programs
11 days
mom getting into vibe coding
Tweet media one
0
0
10
@N8Programs
N8 Programs
11 days
these are crazy numbers for a 13B active w/ only 80B.
@teortaxesTex
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
11 days
I've been saying that this shape is underrated. Qwen bros didn't do that so Tencent bros picked up the slack. 80B total, 13B active, 256K context, 71.2 GPQA-Diamond, pretty good quantization to FP8 and even INT4. Might be «DeepSeek-Medium» for those interested.
Tweet media one
Tweet media two
0
0
2
@N8Programs
N8 Programs
11 days
RT @xeophon_:
Tweet media one
0
2
0
@N8Programs
N8 Programs
11 days
bro is the alexander hamilton of model implementations. how does he code like he's running out of time.
@Prince_Canuma
Prince Canuma
11 days
Last 2 weeks: . > Gemma3n.> Phi4mm vision working, now audio and a few optimisations missing .> Falcon H1 (Mamba + Transformers).> Bitnet metal kernel 90% faster on MLX compared to official Bitnet.cpp .> Falcon Bitnet .> Processed 34m samples and training a new secret model .>.
1
1
9
@N8Programs
N8 Programs
12 days
RT @vikhyatk: still blowing my mind how good object detection got in the last release
0
18
0
@N8Programs
N8 Programs
12 days
RT @xlr8harder: I'm so relieved training as fair use is winning.
0
2
0
@N8Programs
N8 Programs
12 days
RT @Azaliamirh: Introducing Weaver, a test time scaling method for verification! . Weaver shrinks the generation-verification gap through a….
0
47
0
@N8Programs
N8 Programs
12 days
RT @SunshineFiora: it seems like it would be extremely good for the alignment community to run public experiments in post-training open sou….
0
8
0