Zhang-Wei Hong
@ZhangWeiHong9
Followers
171
Following
41
Media
8
Statuses
66
CS PhD student at MIT, interested in reinforcement learning and biologically inspired computation
Joined July 2021
Aha, great question! (BTW @abeirami I think very few of your readers will know what first-order PO means.) I have lots of thoughts on this. Just quick and dirty below. IMO, prompt optimization is not meant to be be understood as "poor man's SFT". If you have access to
Most first-order prompt optimization isn’t that useful these days. A scaffold with evolutionary search gets you much farther. Given the search isn’t local, is gradient/logit info even usable to make the search more efficient?
9
11
98
🚀 Thrilled to share our work: “Preemptive Detection and Steering of LLM Misalignment via Latent Reachability”! 🔒 Post-training (e.g. RLHF) can’t guarantee safe text at inference time—LLMs still produce harmful text. We ask: Can we safeguard LLMs during generation itself? [1/n]
1
2
2
Incredible blog post by @johnschulman2! The RL results are quite surprising... "LoRA fully matches the learning performance of FullFT when running policy gradient algorithms for reinforcement learning, even with ranks as low as 1" I incorrectly assumed that surely RL must be
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
9
43
512
For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇
20
147
910
Olympiad problems are designed to have elegant solutions and new problems are often designed around old patterns. As language models get better we should expect them to conquer IMO/IOI problems. Problems that are not designed to have elegant solutions are harder.
0
1
12
Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model? New paper with Zak Mhammedi and Dhruv Rohatgi: The Computational Role of the Base Model in Exploration thread:
10
109
695
The next frontier for AI shouldn’t just be generally helpful. It should be helpful for you! Our new paper shows how to personalize LLMs — efficiently, scalably, and without retraining. Meet PReF ( https://t.co/XaLZAwimse) 1\n
2
28
58
Auditing and exposing the fragility of language-conditioned robot models with Embodied Red Teaming (ERT)! 🤯 Simple re-phrasing of task instructions, e.g., from "Please bring me a can of coke" to "Give me a coke," is the difference between the robot succeeding or failing.
1
13
79
🚨There May Not be Aha Moment in R1-Zero-like Training: https://t.co/8tBDR1aeDX A common belief about the recent R1-Zero-like training is that self-reflections *emerge* as a result of RL training. We carefully investigated and showed the opposite. 🧵
18
74
474
More Qwen. I'm increasingly comfortable saying these papers seem to be a discovery of some sort about Qwen models, not necessarily about reasoning.
LIMO: Less is More for Reasoning Achieves 57.1% on AIME and 94.8% on MATH w/ only 817 training samples, i.e., only 1% of the training data required by previous approaches
13
31
396
Excited to finally share what I’ve been working on since joining OpenAI last June! The goal of deep-research is to enable reasoning models with tools to tackle long-horizon tasks in the real world and discover new knowledge. It’s a highly autonomous agent—hand it a hard problem,
openai.com
An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you. Available to Pro users today, Plus and Team next.
74
79
1K
[5/5] “ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization” we introduce algorithms that in combination with LLM reward generation can find useful reward shapings using online model selection strategies.@calvincbzhang @pulkitology @ZhangWeiHong9
0
2
3
[4/4] 🚀 Let’s make robotic foundation models more robust! We encourage everyone in the field to leverage ERT for fine-tuning and evaluation. 👏 This work was co-led by @SathwikKarnik and Nishant Abhangi, with advising from @yen_chen_lin, @johnsonwang0810, and @pulkitology.
0
0
1
[3/4] 🌐 Explore our work, "Embodied Red Teaming for Auditing Robotic Foundation Models", with paper, code, and more here: https://t.co/APGol4fzFU. Proudly presented at the 2024 NeurIPS Safe Generative AI Workshop!
sites.google.com
Robots are sensitive to the language instruction
1
0
1
[2/4] 🛠 Robots are highly sensitive to instructions. ERT dives deep, automatically generating instructions that expose failure modes or undesirable behaviors in robotic models.
1
0
1
[1/4] 🚨 Excited to introduce Embodied Red Teaming (ERT) – an approach based on vision-language models (VLM) to automatically red team your favorite robotic foundation models!
1
5
12
People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the
556
2K
13K
A very exciting personal update: In January, I’ll be joining @CMUMLD as tenure-track assistant professor! My lab will focus on the mathematical foundations of, and new algorithms, for decision making. This includes everything from reinforcement learning in the physical world
26
44
410
🚀 Stronger, simpler, and better! 🚀 Introducing Value Augmented Sampling (VAS) - our new algorithm for LLM alignment and personalization that outperforms existing methods!
4
34
131
New MIT CSAIL method automatically breaks LLAMA2-7B, GPT-3.5, Stable Diffusion, and more with curiosity-driven exploration! This type of exploration helps test LLMs & vision models by generating diverse inputs/prompts that trigger unwanted responses from a target model:
2
26
131