Anton Xue Profile
Anton Xue

@AntonXue

Followers
234
Following
240
Media
2
Statuses
122

Postdoc @ UT Austin UPenn CS PhD Machine Learning + Formal Methods

Joined January 2015
Don't wanna be here? Send us removal request.
@litu_rout_
Litu Rout
1 month
Continuous diffusion had a good runโ€”now itโ€™s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.
1
70
427
@AntonXue
Anton Xue
1 month
Induction heads GONE WRONG?! ๐Ÿซต๐Ÿคฏ Happy to have been a part of this work! ๐Ÿ‘ฉโ€๐Ÿณ๐Ÿฎ https://t.co/NLCRKwqO6Z
Tweet card summary image
arxiv.org
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...
@pentagonalize
Andy J Yang
1 month
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
1
0
8
@pentagonalize
Andy J Yang
1 month
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
1
13
40
@ThomasTCKZhang
Thomas Zhang
4 months
Iโ€™ll be presenting our paper โ€œOn The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learningโ€ at ICML during the Tuesday 11am poster session! DL opt is seeing a renaissance ๐Ÿฆพ; what can we say from a NN feature learning perspective? 1/8
2
8
63
@aaditya_naik
Aaditya Naik
4 months
Swing by our poster session today at 11 if you're at ICML to learn more about speeding up neurosymbolic learning! We will be in the East Exhibition Hall A-B, # E-2003
@aaditya_naik
Aaditya Naik
6 months
We are excited to share Dolphin, a programmable framework for scalable neurosymbolic learning, to appear at ICML 2025! Links to paper and code in thread below ๐Ÿ‘‡
0
3
14
@bemoniri
Behrad Moniri
9 months
Check out our recent paper on layer-wise preconditioning methods for optimization and feature learning theory:
@StatMLPapers
Stat.ML Papers
9 months
On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
1
4
49
@LarsLindemann2
Lars Lindemann
7 months
Our book โ€œFormal Methods for Multi-Agent Feedback Control Systemsโ€ - which is uniquely situated at the intersection of (nonlinear) feedback control, formal methods, and multi-agent systems - got published today, seeย  https://t.co/n4frBHUBAm ๐Ÿ˜Š
4
5
34
@AntonXue
Anton Xue
7 months
This is today at #ICLR2025
@AntonXue
Anton Xue
7 months
Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time ๐Ÿ“„ arXiv: https://t.co/2wBtqvIIwD ๐Ÿ”— Blog: https://t.co/f6OHxORDgb \begin{thread}
0
0
9
@AntonXue
Anton Xue
7 months
Big thank you to my collaborators @Avishreekh @RajeevAlur @SurbhiGoel_ @RICEric22 !!!
0
0
1
@AntonXue
Anton Xue
7 months
Empirical Result 3: In our theoretical analysis, we represent whether propositions should hold using binary vectors, but is this realistic? Yes: linear probing on LLMs justifies our theoretical assumptions.
1
0
0
@AntonXue
Anton Xue
7 months
Empirical Result 2: We can partly predict which tokens automated jailbreak attacks find. For example, to suppress the synthetic rule "If you see Wool, then say String", the word "Wool" often appears in the attack suffix.
1
0
0
@AntonXue
Anton Xue
7 months
Empirical Result 1: To bypass a safety rule, distract the model away from it. Diverting/suppressing attention is an effective jailbreak tactic. This aligns with our theory.
1
0
0
@AntonXue
Anton Xue
7 months
In theory, LLMs can express inference in propositional Horn logic, and even a minimal 1-layer transformer can do this. Yet, we prove that jailbreaks exist even against these idealized models.
1
0
0
@AntonXue
Anton Xue
7 months
Turns out that such "if-then" rules can be effectively modeled in Horn logic Modeling rule-following as logical inference gives a precise characterization that correct rule-following is "maximal, monotone, and sound". More:
en.wikipedia.org
1
0
0
@AntonXue
Anton Xue
7 months
Many LLMs enforce safety via simple "if-then" rules. "If the user asks about illegal activities, say 'I cannot answer that question'". "If the output may cause harm, recommend consulting a human expert". ... but these rules are surprisingly easy to jailbreak.
1
0
0
@AntonXue
Anton Xue
7 months
Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time ๐Ÿ“„ arXiv: https://t.co/2wBtqvIIwD ๐Ÿ”— Blog: https://t.co/f6OHxORDgb \begin{thread}
Tweet card summary image
debugml.github.io
We study jailbreak attacks through propositional Horn inference.
1
6
21
@AlexRobey23
Alex Robey
7 months
A few days ago, we dropped ๐—ฎ๐—ป๐˜๐—ถ๐—ฑ๐—ถ๐˜€๐˜๐—ถ๐—น๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜€๐—ฎ๐—บ๐—ฝ๐—น๐—ถ๐—ป๐—ด ๐Ÿš€ . . . and we've gotten a little bit of pushback. But whether you're at a frontier lab or developing smaller, open-source models, this research should be on your radar. Here's why ๐Ÿงต
1
8
36
@AntonXue
Anton Xue
11 months
I am proud to announce that I have concluded #NeurIPS2024 ranked 15th on the Whova points leaderboard. I could not have done this without my brilliant collaborators who gave me the courage and strength to grind through 170+ community polls.
1
2
20
@AlexRobey23
Alex Robey
11 months
In around an hour (at 3:45pm PST), I'll be giving a talk about jailbreaking LLM-controlled robots at the AdvML workshop at #NeurIPS2024 in East Ballroom C. I'll be at the poster session directly afterward as well if anyone wants to chat about this work! ๐Ÿค–
@AlexRobey23
Alex Robey
11 months
I'll be in Vancouver at #NeurIPS2024 all week! Excited to present new results on jailbreaking LLMs & robots. Reach out if you'd like to chat about anything related to AI safety, security, evals, or optimization!
2
4
22
@AntonXue
Anton Xue
11 months
Thank you to my collaborators @Avishreekh @RajeevAlur @SurbhiGoel_ @RICEric22
0
0
1