From the papers that I've read on LLMs in the past 6 months, one thing is clear: higher data quality will be key to keep pushing progress. Lots of companies and researchers keep innovating and implementing ways to improve data quality in all areas ranging from finetuning LLMs… Tweet added by elvis @omarsar0

elvis

10 months

From the papers that I've read on LLMs in the past 6 months, one thing is clear: higher data quality will be key to keep pushing progress. Lots of companies and researchers keep innovating and implementing ways to improve data quality in all areas ranging from finetuning LLMs…

12

35

230

AI Time to Impact

@AITimetoImpact

10 months

@omarsar0 Agreed, *but* people don't want to work on data quality. From a Google Research paper 2 years ago, even in incredibly high stakes use cases, most teams aren't cleaning up their data.

Marshall Kirkpatrick

@marshallk

3 years

“Everyone wants to do the model work, not the data work” new Google research finds that under-appreciation of data quality, including in high-stakes AI, results in 92% of AI projects experiencing data cascades: compounding, negative, downstream events

5

139

471

0

4

11

Arjun

@aiguy_arjun

10 months

@omarsar0 Yes! As shown by the Open-Platypus dataset and many others in recent times.

0

1

Andres Segura-Tinoco

@SeguraAndres7

10 months

@omarsar0 No doubt about it. It is a premise of more than 10 years ago, which I believe will continue to be valid for a long time.

0

1

Space Ranger

@buzz1light1year

10 months

@omarsar0 garbage in, garbage out

0

2

M.H

@mhz758

10 months

@omarsar0 How can i learn LLM Please, recommend courses or tutorial

0

Krishna Sangeeth

@whiletruelearn

10 months

@omarsar0 Yeah beyond a doubt, ‘text book is all you need’ proved that high quality data can make a big impact.

0

1

2

Jun

@junzhao333

10 months

@omarsar0 hope you share about how to Gathering high-quality data

0

Taeha 🏓

@taehallm

10 months

@omarsar0 Thank you for the insight!

0

Julez

@JulezArdilla

10 months

@omarsar0 Same applies to human educational system.

0

peregil

@peregil

10 months

@omarsar0 100% agree. Ive spent a lot of time cleaning up corpora in my own language. Ive also both seen that data cleaning both makes the corpus better and worse. What would you @omarsar0 is the most important criteria for s good/clean corpus?

0

Replies