Anton Xue
@AntonXue
Followers
234
Following
240
Media
2
Statuses
122
Postdoc @ UT Austin UPenn CS PhD Machine Learning + Formal Methods
Joined January 2015
Continuous diffusion had a good runโnow itโs time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.
1
70
427
Induction heads GONE WRONG?! ๐ซต๐คฏ Happy to have been a part of this work! ๐ฉโ๐ณ๐ฎ https://t.co/NLCRKwqO6Z
arxiv.org
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
1
0
8
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
1
13
40
Iโll be presenting our paper โOn The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learningโ at ICML during the Tuesday 11am poster session! DL opt is seeing a renaissance ๐ฆพ; what can we say from a NN feature learning perspective? 1/8
2
8
63
Swing by our poster session today at 11 if you're at ICML to learn more about speeding up neurosymbolic learning! We will be in the East Exhibition Hall A-B, # E-2003
We are excited to share Dolphin, a programmable framework for scalable neurosymbolic learning, to appear at ICML 2025! Links to paper and code in thread below ๐
0
3
14
Our book โFormal Methods for Multi-Agent Feedback Control Systemsโ - which is uniquely situated at the intersection of (nonlinear) feedback control, formal methods, and multi-agent systems - got published today, seeย https://t.co/n4frBHUBAm ๐
4
5
34
This is today at #ICLR2025
Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time ๐ arXiv: https://t.co/2wBtqvIIwD ๐ Blog: https://t.co/f6OHxORDgb \begin{thread}
0
0
9
0
0
1
Empirical Result 3: In our theoretical analysis, we represent whether propositions should hold using binary vectors, but is this realistic? Yes: linear probing on LLMs justifies our theoretical assumptions.
1
0
0
Empirical Result 2: We can partly predict which tokens automated jailbreak attacks find. For example, to suppress the synthetic rule "If you see Wool, then say String", the word "Wool" often appears in the attack suffix.
1
0
0
Empirical Result 1: To bypass a safety rule, distract the model away from it. Diverting/suppressing attention is an effective jailbreak tactic. This aligns with our theory.
1
0
0
In theory, LLMs can express inference in propositional Horn logic, and even a minimal 1-layer transformer can do this. Yet, we prove that jailbreaks exist even against these idealized models.
1
0
0
Turns out that such "if-then" rules can be effectively modeled in Horn logic Modeling rule-following as logical inference gives a precise characterization that correct rule-following is "maximal, monotone, and sound". More:
en.wikipedia.org
1
0
0
Many LLMs enforce safety via simple "if-then" rules. "If the user asks about illegal activities, say 'I cannot answer that question'". "If the output may cause harm, recommend consulting a human expert". ... but these rules are surprisingly easy to jailbreak.
1
0
0
Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time ๐ arXiv: https://t.co/2wBtqvIIwD ๐ Blog: https://t.co/f6OHxORDgb \begin{thread}
debugml.github.io
We study jailbreak attacks through propositional Horn inference.
1
6
21
A few days ago, we dropped ๐ฎ๐ป๐๐ถ๐ฑ๐ถ๐๐๐ถ๐น๐น๐ฎ๐๐ถ๐ผ๐ป ๐๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด ๐ . . . and we've gotten a little bit of pushback. But whether you're at a frontier lab or developing smaller, open-source models, this research should be on your radar. Here's why ๐งต
1
8
36
I am proud to announce that I have concluded #NeurIPS2024 ranked 15th on the Whova points leaderboard. I could not have done this without my brilliant collaborators who gave me the courage and strength to grind through 170+ community polls.
1
2
20
In around an hour (at 3:45pm PST), I'll be giving a talk about jailbreaking LLM-controlled robots at the AdvML workshop at #NeurIPS2024 in East Ballroom C. I'll be at the poster session directly afterward as well if anyone wants to chat about this work! ๐ค
I'll be in Vancouver at #NeurIPS2024 all week! Excited to present new results on jailbreaking LLMs & robots. Reach out if you'd like to chat about anything related to AI safety, security, evals, or optimization!
2
4
22
0
0
1