@leavittron
Matthew Leavitt
7 months
Unfortunately A/B testing is tough: it requires lots of subjects and/or well-defined use patterns. In lieu of that, my favorite eval method is "find someone who has spent way too much time using way too many models and ask them to do a vibe check". Reviewers don't love this tho.
Tweet media one
2
5
45

Replies

@leavittron
Matthew Leavitt
7 months
A few (somewhat data-centric) thoughts on the Gemini whitepaper 🧵: Can't be much more direct than this: "We find that data quality is critical to a highly-performing model". It feels especially true cuz they provide next to no information on the training data.
Tweet media one
3
20
187
@leavittron
Matthew Leavitt
7 months
Despite Gemini explicitly acknowledging the importance of data quality, I’m sure ML twitter will keep perseverating on the importance of architecture choices like the “efficient attention mechanisms” that the report also mentions
Tweet media one
1
3
33
@leavittron
Matthew Leavitt
7 months
Gemini also continues the trend of training small models for looonger. As deep learning models transition from research artifact to production necessity, inference costs are going to increasingly dominate the economics. Llongboi just keeps getting llonger:
@NaveenGRao
Naveen Rao
1 year
Ok, for those wondering about the origin of our nickname "Llongboi", here it is. ( @jefrankle got mad at me for putting this in the wild. Once it's free, it's free!)
1
0
22
1
0
26
@leavittron
Matthew Leavitt
7 months
One very relevant consequence of token budgets increasing is that the need for data curation also increases! The quantity (and possibly even proportion 😱) of redundant, noisy, and misleading examples increases with the size of your dataset!
1
0
16
@leavittron
Matthew Leavitt
7 months
The Gemini whitepaper also emphasizes the importance of training the tokenizer on a “large sample” of the dataset. IMO tokenizers as a vector for model improvement are vastly underexploited. Data curation and tokenization both suffer because researchers overlook data.
Tweet media one
1
1
35
@leavittron
Matthew Leavitt
7 months
Gemini Ultra training was distributed across datacenters! Model parallel within SuperPods (and datacenters) and data parallel across SuperPods (and datacenters)! This is impressive in part because gradients are notoriously shy and reluctant to leave their home datacenter.
Tweet media one
1
1
23
@leavittron
Matthew Leavitt
7 months
TFW cosmic rays ruin your training run. To be fair, most SDC events probably aren't due to cosmic rays, but it's fun to think about the universe extending a glittering tendril into the delicate gears of your trainer and whispering "nope".
Tweet media one
3
6
73
@leavittron
Matthew Leavitt
7 months
The benchmark evals are pretty impressive compared to the baselines, though I'm always skeptical of benchmarks. A/B testing seems like the most direct way to measure model quality IMO...which they do! Though not against GPT4 ☹️
Tweet media one
Tweet media two
1
0
10
@leavittron
Matthew Leavitt
7 months
Overall the Gemini work is very technically impressive and highlights the importance of data quality. I'm excited for everyone to vibe check the Ultra model once it's available, and to see what it's the subsequent data sheets/white papers when they're released.
2
0
10
@TheJohnEgan
John E-gen 🔮
6 months
@leavittron @jeremyphoward vivo- vibe in , vibe out
0
0
1