CarsonPoole Profile Banner
Carson Poole Profile
Carson Poole

@CarsonPoole

Followers
934
Following
1K
Media
131
Statuses
1K

New York City
Joined May 2010
Don't wanna be here? Send us removal request.
@CarsonPoole
Carson Poole
10 months
this was lots of fun and lots of all nighters over the past few weeks. really happy with what we achieved!
@ArtificialAnlys
Artificial Analysis
10 months
NVIDIA Blackwell can achieve 303 output tokens/s for DeepSeek R1 in FP4 precision, per our benchmarking of an Avian API endpoint Artificial Analysis benchmarked DeepSeek R1 on an @avian_io private API endpoint. Running DeepSeek R1 in FP4 precision on NVIDIA Blackwell, their
2
2
12
@CarsonPoole
Carson Poole
14 days
GPT-5.2T vs Opus 4.5 shows some major big model smell
0
0
0
@CarsonPoole
Carson Poole
14 days
someone pls make a worldle that has a daily leaderboard for whom can make the highest logprob sentence in some range of tokens
0
0
0
@CarsonPoole
Carson Poole
18 days
I have never understood why people need these tools? What is hard about (# billion params) * (2 for 16 bit, 1 for 8bit, etc) * (fudge factor for activations, kv cache) < (vram on your GPU in GB)
@cneuralnetwork
neural nets.
19 days
I made an internal tool for myself to check the VRAM required to run models on GPUs Open-sourcing it today! "do-i-have-the-vram" checks the amount of vram you need to loading the model, without loading the model! use it by running ` pip install do-i-have-the-vram `
0
0
1
@CarsonPoole
Carson Poole
30 days
the comparison is really striking between when Google released Lion versus everybody quietly switching to Muon
@eliebakouch
elie
1 month
lfg, deepseek uses Muon in the ablation setup of their latest paper
0
1
1
@DraftedAI
Drafted
2 months
Hello World 👋 Welcome to Drafted — an AI tool that lets anyone design a home from scratch, tailored to your life. https://t.co/zoa23fKUvV
Tweet card summary image
techcrunch.com
Drafted is now nearly five months old, and it's everything Atmos wasn't.
9
15
102
@CarsonPoole
Carson Poole
2 months
a phenomenon I haven’t seen anybody point out is what happens when you can “few shot” a robot? with sufficient scale this ability emerged with LLMs. instead of training it to perform a specific task, can you show it 2-3 representative examples of itself doing said task?
@physical_int
Physical Intelligence
2 months
We got our robots to wash pans, clean windows, make peanut butter sandwiches, and more! Fine-tuning our latest model enables all of these tasks, and this has interesting implications for robotics, Moravec's paradox, and the future of large models in embodied AI. More below!
0
0
0
@CarsonPoole
Carson Poole
4 months
the momentum is building
@karpathy
Andrej Karpathy
4 months
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've
0
0
0
@CarsonPoole
Carson Poole
4 months
>1000 tps single batch inference. short HBM long FLOPs
@AntLingAGI
Ant Ling
4 months
A new milestone on dLLMs🚀🚀🚀
2
0
2
@CarsonPoole
Carson Poole
4 months
why does this look like nvidia selling gloves
0
0
0
@CarsonPoole
Carson Poole
4 months
TIL the originator of the phrase "embarassingly parallel" is Cleve Moler, the creator of matlab (sorry if that gives you painful flashbacks)
0
0
0
@CarsonPoole
Carson Poole
4 months
make (positive) science fiction a reality
@ProudSocialist
Power to the People ☭🕊
4 months
This is diabolical. There is no future for humanity when AI consumes this much energy. Shut these evil robot corporations down and make science fiction films fiction again!!!
0
0
1
@CarsonPoole
Carson Poole
4 months
0
0
1
@CarsonPoole
Carson Poole
4 months
even more irksome when you plot the exact time range of @MorePerfectUS's post
0
0
1
@CarsonPoole
Carson Poole
4 months
the affordability of a kWh for the median US household over time. feels like this needs a @CommunityNotes for being so egregiously misleading
@MorePerfectUS
More Perfect Union
5 months
1
0
3
@CarsonPoole
Carson Poole
4 months
in 2021 I emailed Philippe Tillet (creator of Triton) about adding 4bit datatypes, and he was (reasonably!) skeptical at the time. not a dunk - Philippe is obviously world-class; just a reminder to update your mental models while you update your language models :)
@arankomatsuzaki
Aran Komatsuzaki
4 months
NVFP4: 4-bit pretraining for LLMs • New format w/ 2-level scaling + RHT + stochastic rounding • Trains 12B model on 10T tokens • Matches FP8 baseline: MMLU-pro 62.58% vs 62.62% • 6.8× efficiency boost potential → faster, cheaper frontier LLMs
0
0
2
@CarsonPoole
Carson Poole
4 months
another one
@NiJinjie
Jinjie Ni
4 months
🍷Imagine you are the boss of Google DeepMind. To train the best diffusion language model in world within 1 year, using 800 TPU pods, which model size will you go for? 🐿️ We build Quokka to help you decide–the first-ever large-scale scaling law for DLMs. Interesting facts: 1.
1
0
1
@CarsonPoole
Carson Poole
5 months
has anyone made the “live long enough to see yourself become a fintech company” joke yet?
@Cloudflare
Cloudflare
5 months
Cloudflare introduces NET Dollar, a new U.S. dollar-backed stablecoin that will enable instant, secure transactions for the agentic web.
0
0
2
@CarsonPoole
Carson Poole
5 months
took this on my flight last week lmao
@nypost
New York Post
5 months
$2.2 billion solar plant in California turned off after years of wasted money: ‘Never lived up to its promises’ https://t.co/TuRZYvDyjX
0
0
1
@CarsonPoole
Carson Poole
5 months
1 matmul — tenth grade math class 100 matmuls — you’ve solved a system of equations 100,000 matmuls — you overfit a linear regression 1 million matmuls — your MacBook’s M4 sounds like a jet engine 1 quintillion matmuls — you have summoned god from silicon
0
0
3
@CarsonPoole
Carson Poole
5 months
the way people are now saying, “I was asking Chat,” or “just ask Chat how to do it” is a phenomenon I haven’t seen since the verbifying of Google
1
0
0