RylanSchaeffer Profile Banner
Rylan Schaeffer Profile
Rylan Schaeffer

@RylanSchaeffer

Followers
5K
Following
8K
Media
359
Statuses
1K

CS PhD Student at Stanford Trustworthy AI Research with @sanmikoyejo. Prev interned/worked @ Meta, Google, MIT, Harvard, Uber, UCL, UC Davis

Mountain View, CA
Joined October 2011
Don't wanna be here? Send us removal request.
@RylanSchaeffer
Rylan Schaeffer
2 days
New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track. Joint w/ @sanmikoyejo @JoshuaK92829 @yegordb @bremen79 @koustuvsinha @in4dmatics @JesseDodge @suchenzang @BrandoHablando @MGerstgrasser @is_h_a @ObbadElyas. 1/6
Tweet media one
12
47
383
@RylanSchaeffer
Rylan Schaeffer
15 hours
"I saw the new Jurassic World movie. $10mm is enough for Scarlett to risk her life to get some dinosaur blood samples, but it’s not enough to hire an ML researcher". - @tianxie233.
0
1
7
@RylanSchaeffer
Rylan Schaeffer
1 day
RT @Happylemon56775: Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-sim….
0
82
0
@RylanSchaeffer
Rylan Schaeffer
2 days
Manuscript: 🙏 for feedback from @BlancheMinerva @random_walker @hugo_larochelle @jpineau1 Nicholas Carlini @BlackHC & others I'm probably sadly forgetting right now. 6/6.
0
0
23
@RylanSchaeffer
Rylan Schaeffer
2 days
We demonstrate what submissions to R&C might look like with parallel-but-standalone work regarding an @iclr_conf #ICLR2025 Oral. 5/6.
@RylanSchaeffer
Rylan Schaeffer
18 days
🚨New preprint 🚨. Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models. We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims. 1/8
Tweet media one
1
1
22
@RylanSchaeffer
Rylan Schaeffer
2 days
We argue ML conference should establish a dedicated “Refutations and Critiques” (R&C) track to provide a high-profile, reputable, and rigorously peer-reviewed platform for research that identifies & corrects misleading/incorrect/potentially fraudulent claims presented in.
2
0
21
@RylanSchaeffer
Rylan Schaeffer
2 days
- Reforming the peer review process is likely impractical due to reviewer incentives and institutional comfort w/ current reviewing process. - Lack of formal recourse harms the field, imposes costs & relegates disputes to non-scientific.venues w/o impartial adjudication. 3/6.
1
0
20
@RylanSchaeffer
Rylan Schaeffer
2 days
The case:. - Fallibility of peer review means ML conferences sometimes accept & even highlight misleading/incorrect/flawed/possibly fraudulent research, causing false information to proliferate. - ML conferences currently lack official processes for rectifying such errors . 2/6.
1
0
27
@RylanSchaeffer
Rylan Schaeffer
3 days
RT @deedydas: AI now beats every single human in the hardest college entrance exam in India, the IIT JEE. Bytedance silently published thi….
0
363
0
@RylanSchaeffer
Rylan Schaeffer
5 days
@JoshuaK92829 Lastly, don't sleep on our NEW position paper: Model Collapse Does Not Mean What You Think. We discuss the state of research on synthetic data & model collapse, and where we feel more effort is necessary. w/ @AlvanArulandu @JoshuaK92829 @sanmikoyejo . 7/7
Tweet media one
0
3
9
@RylanSchaeffer
Rylan Schaeffer
5 days
Give my co-first author a follow @JoshuaK92829. Paper: Code: 6/7.
1
2
9
@RylanSchaeffer
Rylan Schaeffer
5 days
Lastly, we investigate how the value of self-generated data depends on the cardinality or proportion of real data In SFT'd LMs, we found two regimes:. 1. If real data are plentiful -> synthetic data is generally harmful. 2. If real data are scarce -> choosing the right amount of
Tweet media one
1
2
7
@RylanSchaeffer
Rylan Schaeffer
5 days
Next, we studied a middle-ground: data accumulate over time, but each model is trained with fixed data & fixed compute What happens? . Test loss on real data is worse than training on all accumulated data, but doesn't diverge as caused by deleting data en masse (in 5 settings)
Tweet media one
1
2
4
@RylanSchaeffer
Rylan Schaeffer
5 days
I love this result in KDEs:. In some settings, fitting KDEs to data, then sampling from KDEs and refitting using the real and synthetic data can yield lower loss on real heldout (test) data than the original KDE trained purely on real data 🤯. 3/7
Tweet media one
1
2
5
@RylanSchaeffer
Rylan Schaeffer
5 days
Previous papers reached "contradictory" conclusions about the impact of synthetic data, but used different methodologies. We compared head to head, and found:. - If data are constantly deleted -> model collapse.- If synthetic & real data accumulate -> model collapse is avoided
Tweet media one
1
2
6
@RylanSchaeffer
Rylan Schaeffer
5 days
Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?. Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄. @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo . 1/7
Tweet media one
4
20
107
@RylanSchaeffer
Rylan Schaeffer
8 days
Nothing inspires confidence for a hockey team like watching Michael Jordan lace up his skates and grab his lacrosse stick to teach them how to play.
@morqon
morgan —
9 days
zuck reads arxiv like a shopping list
Tweet media one
2
0
29
@RylanSchaeffer
Rylan Schaeffer
9 days
performance depend on not just the language model's scaling behavior of the correct choice, but also on the language model's scaling behavior of a small set of specific incorrect choices. These wrong answers + transformations from loss to Acc make predicting hard!. 2/3
Tweet media one
1
2
13
@RylanSchaeffer
Rylan Schaeffer
9 days
Another #ICML2025 paper!. Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?. TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c . 1/3
Tweet media one
2
15
87