apaz_cli Profile Banner
apaz Profile
apaz

@apaz_cli

Followers
556
Following
3K
Media
65
Statuses
657

https://t.co/EYtS07MR7w Making GPUs go brrr

Hiding in your wifi
Joined July 2019
Don't wanna be here? Send us removal request.
@apaz_cli
apaz
3 months
I'm writing an IDE specifically for LLM-aided code hyper-optimization. I'm going to write up a blog post about it and open source soon. Here's a screenshot of what I've got so far.
Tweet media one
4
0
23
@apaz_cli
apaz
3 days
Terrible license. I didn't expect such onerous terms for a 300M param model, but here we are. Also, curious wording. They do not compare to Qwen3-Embedding-0.6B, because they do not beat it. With that said it's probably a good model.
@sundarpichai
Sundar Pichai
3 days
Introducing EmbeddingGemma, our newest open model that can run completely on-device. It's the top model under 500M parameters on the MTEB benchmark and comparable to models nearly 2x its size – enabling state-of-the-art embeddings for search, retrieval + more.
0
0
1
@grok
Grok
19 days
Join millions who have switched to Grok.
657
764
6K
@apaz_cli
apaz
3 days
I've become increasingly convinced that if you aren't training in FP8 or smaller you're a chump. Especially if you're trying to do RL, but also in any setting where you're compute constrained and not data constrained. It is just usually better to do quantized training. There's.
@__tinygrad__
the tiny corp
3 days
As dtypes get smaller, what FLOPS do people care the most about?.
1
1
6
@apaz_cli
apaz
3 days
The number of LLM tools that seize up when they encounter <|endoftext|> in a string literal astonishes me. I thought everyone knew to be careful about this.
0
0
3
@apaz_cli
apaz
6 days
They are starting to become self aware.
Tweet media one
0
0
4
@apaz_cli
apaz
7 days
I'm implementing a tokenizer for a project, and it astonishes me that tiktoken is generally considered "fast". Agony.
Tweet media one
1
0
5
@apaz_cli
apaz
12 days
The SF health cult has convinced me, I'm getting into supplements. Starting out with the basics, plus a multivitamin. The D,L-Phenylalanine is a personal headcannon. But how does anybody swallow these things? They're huge. Can't split them, they're full of foul tasting liquid.
Tweet media one
Tweet media two
0
0
2
@apaz_cli
apaz
13 days
I'm 'boutta crash out, but I'mma crash into bed and go eep instead. nini ❤️.
0
0
0
@apaz_cli
apaz
13 days
The two techniques this paper introduces, JetBlock and PostNAS, are LITERALLY NOT EVEN DEFINED. There's a bunch of red flags in this paper. First, Nvidia did not release the code. The paper does not contain much information, certainly not enough to be reproducible. It's mostly.
@JacksonAtkinsX
Jackson Atkins
13 days
NVIDIA research just made LLMs 53x faster. 🤯. Imagine slashing your AI inference budget by 98%. This breakthrough doesn't require training a new model from scratch; it upgrades your existing ones for hyper-speed while matching or beating SOTA accuracy. Here's how it works:
Tweet media one
2
0
7
@apaz_cli
apaz
14 days
What confuses me is that if you look at the scaling curves, it's clearly better. If you're not doing MoE + quantized training you're a schmuck, especially if you're doing RL, which you should be doing for human preference posttraining anyway, even if you don't believe in.
@teortaxesTex
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
15 days
I knew it. As of Grok-2 generation at least, xAI people genuinely believed finegrained MoE to be a communist psyop. It's on Mixtral level of sophistication. Just throwing GPUs at the wall, fascinating.
Tweet media one
2
0
6
@apaz_cli
apaz
15 days
Tweet media one
@HuggingPapers
DailyPapers
15 days
xAI just released Grok 2 on Hugging Face. This massive 500GB model, a core part of xAI's 2024 work,.is now openly available to push the boundaries of AI research.
0
0
4
@apaz_cli
apaz
16 days
God this must be so embarrassing.
@alexandr_wang
Alexandr Wang
16 days
1/ Today we’re proud to announce a partnership with @midjourney, to license their aesthetic technology for our future models and products, bringing beauty to billions.
1
0
1
@apaz_cli
apaz
16 days
Please for the love of all that is holy tell me they were already doing this.
@AnthropicAI
Anthropic
16 days
New Anthropic research: filtering out dangerous information at pretraining. We’re experimenting with ways to remove information about chemical, biological, radiological and nuclear (CBRN) weapons from our models’ training data without affecting performance on harmless tasks.
Tweet media one
2
0
6
@apaz_cli
apaz
17 days
Jinja has import statements, I'm 'boutta crash out. I'm not sure why everyone is standardizing on it for tool call prompting.
Tweet media one
1
0
2
@apaz_cli
apaz
19 days
Tonight I was at a hypnosis lecture, and someone asked: "When you imagine yourself on a beach, how do you experience it? Do you see it? Hear it? Feel it?". The thing is, I don't. None of the above. Or I imagine in third person. I think somewhere along the way my brain got fried.
2
0
1
@apaz_cli
apaz
19 days
Same form factor as V3/R1, hoping they revisited the pretraining data with the intent of making it better to do RL on. The more stuff that approximates logical reasoning in the pretraining data, the better.
@_akhaliq
AK
19 days
DeepSeek-V3.1.
0
0
1
@apaz_cli
apaz
20 days
I am faced with the sobering reality that writing an efficient mxfp4 kernel for gpt-oss is not possible in llama.cpp because of the memory layout. Blocks of quantized elements are not stored contiguously, so you cannot issue vector loads across mfxp4 blocks. Sadge.
0
0
6