@leavittron
Matthew Leavitt
10 months
This is why I pushed @MosaicML to create a Data Research Team last year (and @jefrankle recognized the value and made it happen)
@omarsar0
elvis
10 months
From the papers that I've read on LLMs in the past 6 months, one thing is clear: higher data quality will be key to keep pushing progress. Lots of companies and researchers keep innovating and implementing ways to improve data quality in all areas ranging from finetuning LLMs…
12
35
230
1
1
54

Replies

@elmanmansimov
Elman Mansimov
10 months
@leavittron @MosaicML @jefrankle gathering and filtering are key but fundamentally we don't understand how each token affects downstream performance beyond trial and error (add this data, remove this data see what happens). predictable function of F(Loss, token) -> Accuracy(New_Task) is missing
1
0
3