Yike Wang @Neurips25
@yikewang_
Followers
457
Following
226
Media
6
Statuses
97
PhD student @uwcse @uwnlp | BA, MS @berkeley_ai
Joined June 2022
LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future ( https://t.co/zDjjl5GBaZ).
11
57
244
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
51
327
2K
OpenAI's blog ( https://t.co/Mu05PFfPXg) points out that today’s language models hallucinate because training and evaluation reward guessing instead of admitting uncertainty. This raises a natural question: can we reduce hallucination without hurting utility?🤔 On-policy RL with
25
123
669
RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵
12
113
468
Today's AI agents are optimized to complete tasks in one shot. But real-world tasks are iterative, with evolving goals that need collaboration with users. We introduce collaborative effort scaling to evaluate how well agents work with people—not just complete tasks 🧵
6
52
266
🤖💬AI agents can be easily persuaded (like Anthropic’s Claudius often giving discounts). 🤔Previous study on persuasion has been exclusively on text-only modality. We wonder: are AI agents more susceptible when presented with multimodal content? Introducing MMPersuade, a
11
26
130
can we finally use natural language to optimize for deeper notions of what users want from their recommender systems?
4
13
54
“Responses are not monolithic: they switch across diverse skills which favor different model checkpoints in the training pipeline, thus we introduce model-guided collaborative inference to optimally use models with diverse skills for different segments of response generation.”
🔍Aligned LMs are better at reasoning/safety, but lose out on skills like calibration and generation diversity, where pretrained models are better. 🤝How about Don't Throw Away your Pretrained Model, and use multiple model stages of LLM training pipelines in collaboration!
0
3
26
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
193
Introducing the Model Collaboration Tour 🤖🤝 Compositional intelligence. Collaborative development. Decentralized AI. By the Many. The methods. The vision. The hot takes. The comedy. LA folks, join us this week! Get your tickets by asking me to give a talk @ your lab/school!
4
19
34
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
237
1K
8K
check out this great work! 👾
How do we navigate a growing collection of post-trained LLMs? In Delta Activations: A Representation for Finetuned LLMs, we propose a compact embedding that encodes the post-training signal. Try the interactive model navigator 👉 https://t.co/I7mKccXfzr
0
1
5
🧩New blog: From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones Do LLMs learn new skills through RL, or just activate existing patterns? Answer: RL teaches the powerful meta-skill of composition when properly incentivized. 🔗: https://t.co/4Ud8qsYrOT
13
92
428
👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction. ✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025
2
26
82
next chapter of particle swarm optimization: 🔖Data Swarms, evaluation data generation through co-evolution between data generators and models. check this out 🥂
👀 How to find more difficult/novel/salient evaluation data? ✨ Let the data generators find it for you! Introducing Data Swarms, multiple data generator LMs collaboratively search in the weight space to optimize quantitative desiderata of evaluation.
0
3
37
Two caveats with self-alignment: ⚠️ A single model struggles to reliably judge its own generation. ⚠️ A single model struggles to reliably generate diverse responses to learn from. 👉 Introducing Sparta Alignment, where multiple LMs collectively align through ⚔️ combat.
2
13
35
PhD in Computer Science, University of California San Diego 🎓 My research focused on uncertainty and safety in AI systems, including 🤷♀️letting models say "I don't know" under uncertainty 🔎understanding and reducing hallucinations 🔁 methods for answering "how much will
30
21
625
Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵
12
70
331
🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the
4
12
40
🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 https://t.co/dw1QeQackx 🧵 below
7
53
166