Carson Poole @CarsonPoole X Profile

Carson Poole

@CarsonPoole

Followers

934

Following

1K

Media

131

Statuses

1K

https://t.co/Mp6shhAblA

New York City

Joined May 2010

Don't wanna be here? Send us removal request.

Carson Poole

@CarsonPoole

10 months

this was lots of fun and lots of all nighters over the past few weeks. really happy with what we achieved!

Artificial Analysis

@ArtificialAnlys

10 months

NVIDIA Blackwell can achieve 303 output tokens/s for DeepSeek R1 in FP4 precision, per our benchmarking of an Avian API endpoint Artificial Analysis benchmarked DeepSeek R1 on an @avian_io private API endpoint. Running DeepSeek R1 in FP4 precision on NVIDIA Blackwell, their

2

12

Carson Poole

@CarsonPoole

14 days

GPT-5.2T vs Opus 4.5 shows some major big model smell

0

Carson Poole

@CarsonPoole

14 days

someone pls make a worldle that has a daily leaderboard for whom can make the highest logprob sentence in some range of tokens

0

Carson Poole

@CarsonPoole

18 days

I have never understood why people need these tools? What is hard about (# billion params) * (2 for 16 bit, 1 for 8bit, etc) * (fudge factor for activations, kv cache) < (vram on your GPU in GB)

neural nets.

@cneuralnetwork

19 days

I made an internal tool for myself to check the VRAM required to run models on GPUs Open-sourcing it today! "do-i-have-the-vram" checks the amount of vram you need to loading the model, without loading the model! use it by running ` pip install do-i-have-the-vram `

0

1

Carson Poole

@CarsonPoole

30 days

the comparison is really striking between when Google released Lion versus everybody quietly switching to Muon

elie

@eliebakouch

1 month

lfg, deepseek uses Muon in the ablation setup of their latest paper

0

1

Drafted

@DraftedAI

2 months

Hello World 👋 Welcome to Drafted — an AI tool that lets anyone design a home from scratch, tailored to your life. https://t.co/zoa23fKUvV

techcrunch.com

Drafted is now nearly five months old, and it's everything Atmos wasn't.

9

15

102

Carson Poole

@CarsonPoole

2 months

a phenomenon I haven’t seen anybody point out is what happens when you can “few shot” a robot? with sufficient scale this ability emerged with LLMs. instead of training it to perform a specific task, can you show it 2-3 representative examples of itself doing said task?

Physical Intelligence

@physical_int

2 months

We got our robots to wash pans, clean windows, make peanut butter sandwiches, and more! Fine-tuning our latest model enables all of these tasks, and this has interesting implications for robotics, Moravec's paradox, and the future of large models in embodied AI. More below!

0

Carson Poole

@CarsonPoole

4 months

the momentum is building

Andrej Karpathy

@karpathy

4 months

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've

0

Carson Poole

@CarsonPoole

4 months

>1000 tps single batch inference. short HBM long FLOPs

Ant Ling

@AntLingAGI

4 months

A new milestone on dLLMs🚀🚀🚀

2

0

2

Carson Poole

@CarsonPoole

4 months

why does this look like nvidia selling gloves

0

Carson Poole

@CarsonPoole

4 months

TIL the originator of the phrase "embarassingly parallel" is Cleve Moler, the creator of matlab (sorry if that gives you painful flashbacks)

0

Carson Poole

@CarsonPoole

4 months

make (positive) science fiction a reality

Power to the People ☭🕊

@ProudSocialist

4 months

This is diabolical. There is no future for humanity when AI consumes this much energy. Shut these evil robot corporations down and make science fiction films fiction again!!!

0

1

Carson Poole

@CarsonPoole

4 months

https://t.co/Zr3LnVJSK2

0

1

Carson Poole

@CarsonPoole

4 months

even more irksome when you plot the exact time range of @MorePerfectUS's post

0

1

Carson Poole

@CarsonPoole

4 months

the affordability of a kWh for the median US household over time. feels like this needs a @CommunityNotes for being so egregiously misleading

More Perfect Union

@MorePerfectUS

5 months

https://t.co/9Xdh3uhJY6

1

0

3

Carson Poole

@CarsonPoole

4 months

in 2021 I emailed Philippe Tillet (creator of Triton) about adding 4bit datatypes, and he was (reasonably!) skeptical at the time. not a dunk - Philippe is obviously world-class; just a reminder to update your mental models while you update your language models :)

Aran Komatsuzaki

@arankomatsuzaki

4 months

NVFP4: 4-bit pretraining for LLMs • New format w/ 2-level scaling + RHT + stochastic rounding • Trains 12B model on 10T tokens • Matches FP8 baseline: MMLU-pro 62.58% vs 62.62% • 6.8× efficiency boost potential → faster, cheaper frontier LLMs

0

2

Carson Poole

@CarsonPoole

4 months

another one

Jinjie Ni

@NiJinjie

4 months

🍷Imagine you are the boss of Google DeepMind. To train the best diffusion language model in world within 1 year, using 800 TPU pods, which model size will you go for? 🐿️ We build Quokka to help you decide–the first-ever large-scale scaling law for DLMs. Interesting facts: 1.

1

0

1

Carson Poole

@CarsonPoole

5 months

has anyone made the “live long enough to see yourself become a fintech company” joke yet?

Cloudflare

@Cloudflare

5 months

Cloudflare introduces NET Dollar, a new U.S. dollar-backed stablecoin that will enable instant, secure transactions for the agentic web.

0

2

Carson Poole

@CarsonPoole

5 months

took this on my flight last week lmao

New York Post

@nypost

5 months

$2.2 billion solar plant in California turned off after years of wasted money: ‘Never lived up to its promises’ https://t.co/TuRZYvDyjX

0

1

Carson Poole

@CarsonPoole

5 months

1 matmul — tenth grade math class 100 matmuls — you’ve solved a system of equations 100,000 matmuls — you overfit a linear regression 1 million matmuls — your MacBook’s M4 sounds like a jet engine 1 quintillion matmuls — you have summoned god from silicon

0

3

Carson Poole

@CarsonPoole

5 months

the way people are now saying, “I was asking Chat,” or “just ask Chat how to do it” is a phenomenon I haven’t seen since the verbifying of Google

1

0