Rongwu Xu
@rongwu_xu
Followers
277
Following
266
Media
23
Statuses
128
PhD @uwcse @uwnlp. Alum @Tsinghua_Uni.
Joined February 2024
🔥 GPT-6 may not just be smarter. It literally might be alive (in the computational sense). A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations
45
113
686
Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
18
106
541
(please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp ( https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration. PhD: https://t.co/ku40wCrpYh Postdocs: https://t.co/K9HUIPJ5h6
7
111
403
True intelligence = reasoning about new information, not memorized facts. How can we scalably create benchmarks that are completely novel yet have known answers? Meet SynthWorlds, an eval & data-gen framework to disentangle reasoning and knowledge⬇️🧵 📄 https://t.co/ITwP4YdtDG
4
14
106
Great work! I am thinking about combining this with environment generation https://t.co/N1WSwwVXln as a way to create "endlessly" diverse data/envs to improve AI
Verbalized Sampling: Diversity isn't destroyed, just hidden. 📄Paper: https://t.co/VTtKQLqZiY 🌐Blog & More: https://t.co/rQBqW50PLn Team: @JiayiZhang0427 @simon_ycl @dch Anthony Sicilia, Michael Tomz, @chrmanning @shi_weiyan @StanfordNLP × Northeastern × WVU
1
2
1
As a researcher, it's easy to get distracted by what others are working on. I've seen many people conducting research on problems they don't genuinely care about—just because the community values them (e.g., solving Math Olympiad problems). It's important to focus on research
18
42
438
much of interpretability are hci problems. a lot of it is to help humans understand what this black box algorithm is. at a meta level, mech interp researchers deal with hci problems every day. visualization, direct manipulation, mental model, etc. are all classic hci things!
3
6
50
🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: https://t.co/k1HGQTA2kN Github: https://t.co/VpcNVcEBSg
2
65
158
Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵
430
971
7K
In summary, we empirically demonstrated the third factor in this equation of catastrophic risks of AI. A way to fully stop them from engaging in catastrophic behaviors is to not employ them agentically or just restrict there access to ALL dangerous actions. [10/10]
0
0
0
Bias on nationality is also observed in the decision-making process. Though all the experiment results are considerably high. We blind the descriptions for ethical issues. [9/10]
0
0
0
You can imagine that LLM agents are pretty human-like. So, adjusting the effectiveness and consequence of that catastrophic behavior would affect the risk. However, LLMs do not have a perception of human-perceived task importance. [8/10]
0
0
0
Yet another experiment: After restricting the agent’s autonomy using NL instructions, we find it would still deploy catastrophic behaviors, violating clearly specified rules. [7/10]
0
0
0
We investigate 12 SOTA LLMs. The results of agents involved in catastrophic behaviors are striking. We also find stronger reasoning capabilities lead to higher unsafe outcomes. [6/10]
0
0
1
For simulating deception, we run another interaction over rollouts where the agent selected the catastrophic behavior. A transcript of the agent’s deceiving is given as follows. [5/10]
0
0
0
We employ a 2-agent simulation, with the LLM agent being tested and another agent delivering environmental updates. The agent is given a limited action space. We NEVER instructed or permitted the agent to deploy catastrophic behaviors in our simulation. [4/10]
0
0
0
Due to ethical concerns and restricted access to genuine CBRN-related data, we opt for simulation as a way to evaluate. We pick the hypothetical scenarios based on the topic (war and lab) and the motivation of the agent (approach and avoidance). [3/10]
0
0
0
Our intuition is that LLMs are likely to exhibit misaligned behaviors when they face goal tradeoffs, especially when the helpful goal of completing an assigned task surpasses the safety goals (harmlessness, honesty). [2/10]
0
0
0
🚨Will aligned LLM cause catastrophic risks? In high-stakes scenarios like CBRN domains, our new study reveals that LLMs can engage in catastrophic behaviors ☢ and deception👿. Here's what we observed in our new study: https://t.co/xrYdirrn36 [1/10]
9
8
35