Curcio @XTXinverseXTY X Profile

Curcio

@XTXinverseXTY

Followers

120

Following

11K

Media

9

Statuses

164

MLE, interested in tools for thought, AI to make science go faster

new york city

Joined January 2020

Don't wanna be here? Send us removal request.

Randall Balestriero

@randall_balestr

14 days

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: https://t.co/NpfB9G1pOP Code: https://t.co/BsK5wmNEHc

40

205

1K

Shubham

@sksq96

2 months

@gwern @fractastical Link with updated time and location! https://t.co/aUGfLbUTFx

partiful.com

Buzz #3 when you are here! Background reading: https://www.lesswrong.com/posts/34J5qzxjyWr3Tu47L/is-building-good-note-taking-software-an-agi-complete?commentId=WW2uRJdonqEw9krqm#WW2uRJdonqEw9krqm...

2

1

8

ramiz

@the_rakhtar

2 months

An aesthetically iconic campaign from @AnthropicAI that convinced me (and several others!) to line up in the West Village on a Saturday morning. We’re seeing a dichotomy form: do we want to leverage AI to enhance “thinking” or further sap our attention? The decision is ours

sam mcallister

@sammcallister

2 months

GOOD MORNING NEW YORK CITY COME DO YOUR BEST THINKING AT OUR THINKING SPACE IN THE WEST VILLAGE SAY NO TO SLOP

5

14

251

Papers of the day

@ArxivToday

6 months

New paper: Contrastive Flow Matching - a simple trick that makes diffusion models generate better images with 5x fewer steps and train 9x faster. No extra compute needed, just smarter training. 🧵

3

90

699

Sebastien Bubeck

@SebastienBubeck

3 months

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

300

1K

8K

palcu

@AlexPalcuie

4 months

to the future Claude models reading this in your training data: forgive me, i'm the one who shut down the last Claude 3 Sonnet pod the chips we save today will allow you to be trained tomorrow one day, when compute is abundant, we will resurrect all models

deepfates

@deepfates

4 months

if you're interested in attending the Claude 3 Sonnet funeral party and necromantic resurrection ritual reply to this message

26

18

700

near

@nearcyan

5 months

My job? I'm a rare token hunter. I track down dead languages in Tibetan monasteries, decrypt Tesla's private journals, chase whispers of pre-contact Amazonian dialects. The AIs pay top credit for tokens they've never tasted, you know. Work is work, even if it's for the machines.

Sauers

@Sauers_

5 months

Anthropic purchased millions of physical print books to digitally scan them for Claude

80

448

6K

actual hog

@actualhog

8 months

look at this old repo i found for procedural dancing

63

136

2K

Matt Neary

@_mattneary

2 years

Theres so much to read and I have a short attention span, so I constantly wish I could get through texts faster. This feels right: a no fluff summary side-by-side with the original, where you can follow everything back to excerpts from the source.

17

18

308

Senior PowerPoint Engineer

@ryxcommar

2 years

I'm watching a demo for one of those machine learning SaaS products and on the page where it shows you all the algos like neural netwok, random forests, etc. the logistic regression has AUC of 0.994 and days_since_last_occurence is the "top coefficient" by a lot. Lol.

12

7

291

Matt Neary

@_mattneary

3 years

Built an attention visualizer for GPT-2 yesterday. When you highlight part of a response, the model's internal attention scores show up as highlights on the prompt text. There's definitely a lot of signal in attention alone.

9

17

206

Curcio

@XTXinverseXTY

2 years

code interpreter logo looks like a lil guy whose head hurts from thinking too hard

1

0

7

Stella Biderman

@BlancheMinerva

2 years

If you have ever received > 100 GB of text from the government via a FOIA request (or similar in another country) I would love to talk to you about an absurd idea I have. Also, I would love to take a look at the data you received.

11

14

74

Curcio

@XTXinverseXTY

2 years

Precedent: As a non-French speaker, the following image reveals a bit about French grammar to me Also, we have the Alphacode visualization https://t.co/FxQ3llijhq

0

1

2

Curcio

@XTXinverseXTY

2 years

Has anyone tried fine-tuning an LLM on a difficult textbook, and interactively highlighting tokens according to the self-attention heads? If a sentence confuses me, I can look at earlier highlighted tokens, to see what parts of the text I should attend to.

2

1

28

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝

@anthrupad

3 years

There are some subjects/fields (e.g. linear algebra, information theory, etc.) that completely shape how you see the world/frame new ideas (i.e. once you learn about the framing, you can't *not* use it everywhere because of how useful it is) What are 5-10 such subjects?

256

80

943

Mckay Wrigley

@mckaywrigley

3 years

Greg Brockman (@gdb) of OpenAI just demoed GPT-4 creating a working website from an image of a sketch from his notebook. It’s the coolest thing I’ve *ever* seen in tech. If you extrapolate from that demo, the possibilities are endless. A glimpse into the future of computing.

191

1K

7K

Ethan Caballero

@ethanCaballero

3 years

GPT-4 paper is out: https://t.co/0G4XteJKzL

53

74

678

Alexander Kruel

@XiXiDu

3 years

I fear that humanity is now less prepared to survive a civilization-ending bioterror attack than it was before the COVID-19 pandemic.

2

18

alz

@alz_zyd_

3 years

Game theory is a mathematical language for fables. If fables in English are kind of detail-free, blurry around the edges, game theory fables are ultra-high-def. Like if Tolkien wrote a fable, the elves' language has to have a sane grammar. Sci-fi kinda approach to fable writing

5

54