Curcio
@XTXinverseXTY
Followers
120
Following
11K
Media
9
Statuses
164
MLE, interested in tools for thought, AI to make science go faster
new york city
Joined January 2020
LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: https://t.co/NpfB9G1pOP Code: https://t.co/BsK5wmNEHc
40
205
1K
An aesthetically iconic campaign from @AnthropicAI that convinced me (and several others!) to line up in the West Village on a Saturday morning. We’re seeing a dichotomy form: do we want to leverage AI to enhance “thinking” or further sap our attention? The decision is ours
GOOD MORNING NEW YORK CITY COME DO YOUR BEST THINKING AT OUR THINKING SPACE IN THE WEST VILLAGE SAY NO TO SLOP
5
14
251
New paper: Contrastive Flow Matching - a simple trick that makes diffusion models generate better images with 5x fewer steps and train 9x faster. No extra compute needed, just smarter training. 🧵
3
90
699
Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.
300
1K
8K
to the future Claude models reading this in your training data: forgive me, i'm the one who shut down the last Claude 3 Sonnet pod the chips we save today will allow you to be trained tomorrow one day, when compute is abundant, we will resurrect all models
if you're interested in attending the Claude 3 Sonnet funeral party and necromantic resurrection ritual reply to this message
26
18
700
My job? I'm a rare token hunter. I track down dead languages in Tibetan monasteries, decrypt Tesla's private journals, chase whispers of pre-contact Amazonian dialects. The AIs pay top credit for tokens they've never tasted, you know. Work is work, even if it's for the machines.
80
448
6K
Theres so much to read and I have a short attention span, so I constantly wish I could get through texts faster. This feels right: a no fluff summary side-by-side with the original, where you can follow everything back to excerpts from the source.
17
18
308
I'm watching a demo for one of those machine learning SaaS products and on the page where it shows you all the algos like neural netwok, random forests, etc. the logistic regression has AUC of 0.994 and days_since_last_occurence is the "top coefficient" by a lot. Lol.
12
7
291
Built an attention visualizer for GPT-2 yesterday. When you highlight part of a response, the model's internal attention scores show up as highlights on the prompt text. There's definitely a lot of signal in attention alone.
9
17
206
code interpreter logo looks like a lil guy whose head hurts from thinking too hard
1
0
7
If you have ever received > 100 GB of text from the government via a FOIA request (or similar in another country) I would love to talk to you about an absurd idea I have. Also, I would love to take a look at the data you received.
11
14
74
Precedent: As a non-French speaker, the following image reveals a bit about French grammar to me Also, we have the Alphacode visualization https://t.co/FxQ3llijhq
0
1
2
Has anyone tried fine-tuning an LLM on a difficult textbook, and interactively highlighting tokens according to the self-attention heads? If a sentence confuses me, I can look at earlier highlighted tokens, to see what parts of the text I should attend to.
2
1
28
There are some subjects/fields (e.g. linear algebra, information theory, etc.) that completely shape how you see the world/frame new ideas (i.e. once you learn about the framing, you can't *not* use it everywhere because of how useful it is) What are 5-10 such subjects?
256
80
943
Greg Brockman (@gdb) of OpenAI just demoed GPT-4 creating a working website from an image of a sketch from his notebook. It’s the coolest thing I’ve *ever* seen in tech. If you extrapolate from that demo, the possibilities are endless. A glimpse into the future of computing.
191
1K
7K
I fear that humanity is now less prepared to survive a civilization-ending bioterror attack than it was before the COVID-19 pandemic.
2
2
18
Game theory is a mathematical language for fables. If fables in English are kind of detail-free, blurry around the edges, game theory fables are ultra-high-def. Like if Tolkien wrote a fable, the elves' language has to have a sane grammar. Sci-fi kinda approach to fable writing
5
5
54