v excited to finally announce our new work that formalizes one of the most effective practices for training LLMs—something that many industry leaders have conspicuously avoided discussing
s/o to
@danielking36
for the exceptional title. We also considered "Training on the test set is all you need", "The Unreasonable Effectiveness of Training on the Test Set", and "Intriguing Properties of Training on Test Data"
Worried about test data being used in training?
The LLM world is going through a data contamination crisis.
Here's us trying to do something about it:
Paper:
Blog:
w\
@clu_avi
@omerNLP
@yoavgo
@leavittron
Thrilled to hear about this groundbreaking work! It's great to see a formalized approach to training LLMs, especially when it's been a less-discussed topic among industry leaders.