This is why I pushed @MosaicML to create a Data Research Team last year (and @jefrankle recognized the value and made it happen) Tweet added by Matthew Leavitt @leavittron

Matthew Leavitt

10 months

This is why I pushed @MosaicML to create a Data Research Team last year (and @jefrankle recognized the value and made it happen)

elvis

@omarsar0

10 months

From the papers that I've read on LLMs in the past 6 months, one thing is clear: higher data quality will be key to keep pushing progress. Lots of companies and researchers keep innovating and implementing ways to improve data quality in all areas ranging from finetuning LLMs…

230

Replies

Elman Mansimov

@elmanmansimov

10 months

@leavittron @MosaicML @jefrankle gathering and filtering are key but fundamentally we don't understand how each token affects downstream performance beyond trial and error (add this data, remove this data see what happens). predictable function of F(Loss, token) -> Accuracy(New_Task) is missing