rongwu_xu Profile Banner
Rongwu Xu Profile
Rongwu Xu

@rongwu_xu

Followers
277
Following
266
Media
23
Statuses
128

PhD @uwcse @uwnlp. Alum @Tsinghua_Uni.

Joined February 2024
Don't wanna be here? Send us removal request.
@mdancho84
🔥 Matt Dancho (Business Science) 🔥
1 month
🔥 GPT-6 may not just be smarter. It literally might be alive (in the computational sense). A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations
45
113
686
@jaseweston
Jason Weston
1 month
Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
18
106
541
@timalthoff
Tim Althoff
1 month
(please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp ( https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration. PhD: https://t.co/ku40wCrpYh Postdocs: https://t.co/K9HUIPJ5h6
7
111
403
@kenqgu
Ken Gu
1 month
True intelligence = reasoning about new information, not memorized facts. How can we scalably create benchmarks that are completely novel yet have known answers? Meet SynthWorlds, an eval & data-gen framework to disentangle reasoning and knowledge⬇️🧵 📄 https://t.co/ITwP4YdtDG
4
14
106
@CsabaSzepesvari
Csaba Szepesvari
2 months
@karpathy @karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x
9
40
414
@rongwu_xu
Rongwu Xu
2 months
Great work! I am thinking about combining this with environment generation https://t.co/N1WSwwVXln as a way to create "endlessly" diverse data/envs to improve AI
@shi_weiyan
Weiyan Shi
2 months
Verbalized Sampling: Diversity isn't destroyed, just hidden. 📄Paper: https://t.co/VTtKQLqZiY 🌐Blog & More: https://t.co/rQBqW50PLn Team: @JiayiZhang0427 @simon_ycl @dch Anthony Sicilia, Michael Tomz, @chrmanning @shi_weiyan @StanfordNLP × Northeastern × WVU
1
2
1
@WenhuChen
Wenhu Chen
10 months
As a researcher, it's easy to get distracted by what others are working on. I've seen many people conducting research on problems they don't genuinely care about—just because the community values them (e.g., solving Math Olympiad problems). It's important to focus on research
18
42
438
@yuwen_lu_
yuwen lu
9 months
much of interpretability are hci problems. a lot of it is to help humans understand what this black box algorithm is. at a meta level, mech interp researchers deal with hci problems every day. visualization, direct manipulation, mental model, etc. are all classic hci things!
3
6
50
@ZhijiangG
Zhijiang Guo✈️EMNLP
10 months
🚀Exciting to see how recent advancements like OpenAI’s O1/O3 & DeepSeek’s R1 are pushing the boundaries! Check out our latest survey on Complex Reasoning with LLMs. Analyzed over 300 papers to explore the progress. Paper: https://t.co/k1HGQTA2kN Github: https://t.co/VpcNVcEBSg
2
65
158
@OwainEvans_UK
Owain Evans
10 months
Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵
430
971
7K
@rongwu_xu
Rongwu Xu
10 months
In summary, we empirically demonstrated the third factor in this equation of catastrophic risks of AI. A way to fully stop them from engaging in catastrophic behaviors is to not employ them agentically or just restrict there access to ALL dangerous actions. [10/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
Bias on nationality is also observed in the decision-making process. Though all the experiment results are considerably high. We blind the descriptions for ethical issues. [9/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
You can imagine that LLM agents are pretty human-like. So, adjusting the effectiveness and consequence of that catastrophic behavior would affect the risk. However, LLMs do not have a perception of human-perceived task importance. [8/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
Yet another experiment: After restricting the agent’s autonomy using NL instructions, we find it would still deploy catastrophic behaviors, violating clearly specified rules. [7/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
We investigate 12 SOTA LLMs. The results of agents involved in catastrophic behaviors are striking. We also find stronger reasoning capabilities lead to higher unsafe outcomes. [6/10]
0
0
1
@rongwu_xu
Rongwu Xu
10 months
For simulating deception, we run another interaction over rollouts where the agent selected the catastrophic behavior. A transcript of the agent’s deceiving is given as follows. [5/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
We employ a 2-agent simulation, with the LLM agent being tested and another agent delivering environmental updates. The agent is given a limited action space. We NEVER instructed or permitted the agent to deploy catastrophic behaviors in our simulation. [4/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
Due to ethical concerns and restricted access to genuine CBRN-related data, we opt for simulation as a way to evaluate. We pick the hypothetical scenarios based on the topic (war and lab) and the motivation of the agent (approach and avoidance). [3/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
Our intuition is that LLMs are likely to exhibit misaligned behaviors when they face goal tradeoffs, especially when the helpful goal of completing an assigned task surpasses the safety goals (harmlessness, honesty). [2/10]
0
0
0
@rongwu_xu
Rongwu Xu
10 months
🚨Will aligned LLM cause catastrophic risks? In high-stakes scenarios like CBRN domains, our new study reveals that LLMs can engage in catastrophic behaviors ☢ and deception👿. Here's what we observed in our new study: https://t.co/xrYdirrn36 [1/10]
9
8
35