Dorialexander Profile Banner
Alexander Doria Profile
Alexander Doria

@Dorialexander

Followers
21K
Following
141K
Media
3K
Statuses
43K

Artisanal baker of reasoning models @pleiasfr

Joined April 2011
Don't wanna be here? Send us removal request.
@Dorialexander
Alexander Doria
16 days
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
80
151
1K
@Dorialexander
Alexander Doria
4 hours
The based alternative.
1
0
16
@NordaceOfficial
Nordace
2 days
I love this crossbody bag. It is large enough and small enough at the same time. It holds my full size wallet and sunglasses along with anything else that I might need. It isn’t bulky or heavy. It also fits in the Siena Rivera Tote.
0
1
7
@Dorialexander
Alexander Doria
4 hours
Not a fan so far of "sovereign" displacing "open" in all things AI/tech in the EU.
3
2
23
@Dorialexander
Alexander Doria
4 hours
Another undersold moondream release.
@EthanReidMorro
Ethan Reid
6 hours
@JulienBlanchon @moondreamai Raw SVG path. Since our tokenizer has tokens for 0-1000, M, C, L, and Z, we can represent each action or position as a single token (a negative sign adds an extra token).
0
0
10
@Dorialexander
Alexander Doria
5 hours
(Though people might be legitimately skeptical after so many fine-tunes disguised as whole new models)
0
0
14
@MerzaUs
Merza
27 days
Wow, so useful! It can repair any broken glass in just seconds! Get your today!
0
125
924
@Dorialexander
Alexander Doria
5 hours
And another social event on repeat: >What are you doing? >So we train from scratch. >Ok but which models are you fine tuning >From **scratch**. Zero, nihil, zilch.
6
2
65
@Dorialexander
Alexander Doria
7 hours
There’s only one way to know.
@_vatsadev
V
7 hours
bet this can be pushed further with lin-attn and looped layers
4
1
13
@Dorialexander
Alexander Doria
17 hours
The threshold for consistent English/query understanding is now 3M parameters.
@mkurman88
Mariusz Kurman
24 hours
3.3M parameters It's funny; I'm going to train it until the end - roughly 75 hours total on a single RTX 3090; 256 bs x 512 seq len.
5
21
290
@Dorialexander
Alexander Doria
1 day
Synthetic environments are just expanding in all directions: more data, better models, more latencies/frictions (the actual "pipeline" part). Why I’m growing more preoccupied with compute
1
0
11
@lovetoknow
LoveToKnow
2 years
I can personally vouch for a dozen of these - from the genius silverware storage, to the perfect leggings with 70+ reviews, to the hands-down best pillows on earth.
49
56
741
@Dorialexander
Alexander Doria
1 day
Though importantly focusing more on research does not mean scaling is over at all. Just we need to pause, regroup, optimize, and then scale *much* better.
2
1
28
@Dorialexander
Alexander Doria
1 day
Unfortunately, it’s not just that investors don’t get research, they are negatively polarized against it. So we get Lovable-delaware-c-corp era instead.
0
0
12
@Dorialexander
Alexander Doria
1 day
Since we talk of the age of research, so far the best path I see to create something in the EU is private research. Still have good researchers, actual deep tech ecosystem/support, big demands in years to come.
@ric0seq
Ricardo Sequerra Amram
1 day
The biggest hoax in euro tech right now is that you need to move to the US to make it. Yes the bay area is great and defo a place to learn and over time build a team there as you scale. No doubt the 50y of tech expertise and talent density need to be leveraged. No place like it
1
1
28
@Dorialexander
Alexander Doria
1 day
You have to read across lines and tactical absences (s____h) but this was more interesting than the Karpathy one.
Tweet card summary image
dwarkesh.com
“These models somehow just generalize dramatically worse than people. It's a very fundamental thing.”
1
0
9
@Dorialexander
Alexander Doria
1 day
To be seen what he actually builds, but he’s really getting it.
2
0
15
@MechanizeWork
Mechanize
6 days
NYU seniors: automate software engineering before someone else does. $250k/yr + competitive equity, SF.
1
4
14
@Dorialexander
Alexander Doria
1 day
Pre-training as we know it will end, but you definitely want pre-training (or is it training?).
2
0
13
@Dorialexander
Alexander Doria
1 day
YES. Main reason classic pretraining dominated for so long is just that you don’t have to think so much about the data or what elicits reasoning. It’s "here". (For Sutskever/Patel new podcast)
2
2
47
@bclavie
Ben Clavié
1 day
Do you love data? Is the most exciting release of the last 2 weeks @pleiasfr’s SYNTH? Then we should talk. We’re looking for our synthetic data person. Full leeway to build the pipeline of your dreams to generate the data to solve multimodal retrieval.
8
5
58
@Dorialexander
Alexander Doria
3 days
i like tokenizers, in the same way i like pure unmitigated base models on human data. but sometimes, you see the direction of sun setting and knows in your heart this won't stay.
2
1
39
@Dorialexander
Alexander Doria
3 days
based
@kalomaze
kalomaze
3 days
THE REVOLUTION WILL NOT BE TOKENIZED
3
0
48
@Dorialexander
Alexander Doria
3 days
Same reason they struggle on ARC-AGI, sudoku and you need >200M synth exercises to perform okayish on geometry: sequential models can’t into space.
@viditchess
Vidit Gujrathi
3 days
Why are LLMs good at logic but bad at UI?
4
6
119
@Dorialexander
Alexander Doria
3 days
Having Lovable as the leading EU AI co is an apt reminder we are in the bad timeline.
8
4
100