
Christina Baek
@_christinabaek
Followers
2K
Following
526
Media
26
Statuses
121
PhD student @mldcmu | intern @datologyai @GoogleAI | Robust ML
Joined June 2021
Are current reasoning models optimal for test-time scaling? 🌠.No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math!. 1/N
7
104
483
RT @gaurav_ghosal: 1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with….
0
23
0
RT @jen_hsia: 1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance,….
0
21
0
RT @AdtRaghunathan: I will be at #ICML2025 🇨🇦 from Wednesday through Saturday. My students have a lot of exciting papers - check them out….
0
18
0
RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netw….
0
726
0
RT @pratyushmaini: One of the dreams when joining @datologyai was to bring the fruits of data research from labs🔬 to the real world 🌎.Soo g….
0
4
0
RT @anag004: PPO is often frustrating to tune for many continuous control tasks since it keeps getting stuck in local minima. In our SAPG….
0
1
0
RT @AdtRaghunathan: Excited to speak at the CVPR workshop on domain generalization!.Estimating model performance in the wild is hard but cr….
0
4
0
RT @_vaishnavh: 📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:. → LLMs are limited in cre….
0
40
0
RT @ZhengyangGeng: Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a….
0
37
0
RT @yidingjiang: Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog….
yidingjiang.github.io
We describe a unified framework for data selection and curriculum learning via compression.
0
17
0
RT @BingbinL: Excited to announce MOSS, our ICML workshop focused on discoveries at small scale! We believe there's tremendous potential &….
0
15
0
RT @aleks_madry: Building AI systems is now a fragmented process spanning multiple organizations & entities. In new work (w/ @aspenkhopkin….
0
25
0
RT @RuntianZhai: Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya S….
arxiv.org
This dissertation establishes the contexture theory to mathematically characterize the mechanism of representation learning, or pretraining. Despite the remarkable empirical success of foundation...
0
32
0
RT @pratyushmaini: Join me & @hbxnov at #ICLR2025 for our very purple poster on risks of LLM evals by private companies!. 🕒 Today, 10am | 🪧….
0
5
0
RT @rowankwang: SDF has limitations: models might recall their prior knowledge through reasoning or from their environment. Ex: we taught….
0
1
0
RT @m_finzi: Why do larger language models generalize better? . In our new ICLR paper, we derive an interpretable generalization bound show….
arxiv.org
Why do larger language models generalize better? To investigate this question, we develop generalization bounds on the pretraining objective of large language models (LLMs) in the compute-optimal...
0
31
0
RT @ashertrockman: Are you a frontier lab investing untold sums in training? Are you trying to stay competitive? Are you finding that your….
0
29
0
RT @james_y_zou: Does RAG solve hallucination?. Even w/ RAG, we found that >30% of LLMs' medical statements are not fully supported by (som….
0
73
0