Anton Xue @AntonXue X Profile

Anton Xue

@AntonXue

Followers

234

Following

240

Media

2

Statuses

122

Postdoc @ UT Austin UPenn CS PhD Machine Learning + Formal Methods

https://t.co/bIxVjmiKu1

Joined January 2015

Don't wanna be here? Send us removal request.

Litu Rout

@litu_rout_

1 month

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

1

70

427

Anton Xue

@AntonXue

1 month

Induction heads GONE WRONG?! 🫵🤯 Happy to have been a part of this work! 👩‍🍳🍮 https://t.co/NLCRKwqO6Z

arxiv.org

We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...

Andy J Yang

@pentagonalize

1 month

We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!

1

0

8

Andy J Yang

@pentagonalize

1 month

We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers! Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!

1

13

40

Thomas Zhang

@ThomasTCKZhang

4 months

I’ll be presenting our paper “On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning” at ICML during the Tuesday 11am poster session! DL opt is seeing a renaissance 🦾; what can we say from a NN feature learning perspective? 1/8

2

8

63

Aaditya Naik

@aaditya_naik

4 months

Swing by our poster session today at 11 if you're at ICML to learn more about speeding up neurosymbolic learning! We will be in the East Exhibition Hall A-B, # E-2003

Aaditya Naik

@aaditya_naik

6 months

We are excited to share Dolphin, a programmable framework for scalable neurosymbolic learning, to appear at ICML 2025! Links to paper and code in thread below 👇

0

3

14

Behrad Moniri

@bemoniri

9 months

Check out our recent paper on layer-wise preconditioning methods for optimization and feature learning theory:

Stat.ML Papers

@StatMLPapers

9 months

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

1

4

49

Lars Lindemann

@LarsLindemann2

7 months

Our book “Formal Methods for Multi-Agent Feedback Control Systems” - which is uniquely situated at the intersection of (nonlinear) feedback control, formal methods, and multi-agent systems - got published today, see https://t.co/n4frBHUBAm 😊

4

5

34

Anton Xue

@AntonXue

7 months

This is today at #ICLR2025

Anton Xue

@AntonXue

7 months

Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time 📄 arXiv: https://t.co/2wBtqvIIwD 🔗 Blog: https://t.co/f6OHxORDgb \begin{thread}

0

9

Anton Xue

@AntonXue

7 months

Big thank you to my collaborators @Avishreekh @RajeevAlur @SurbhiGoel_ @RICEric22 !!!

0

1

Anton Xue

@AntonXue

7 months

Empirical Result 3: In our theoretical analysis, we represent whether propositions should hold using binary vectors, but is this realistic? Yes: linear probing on LLMs justifies our theoretical assumptions.

1

0

Anton Xue

@AntonXue

7 months

Empirical Result 2: We can partly predict which tokens automated jailbreak attacks find. For example, to suppress the synthetic rule "If you see Wool, then say String", the word "Wool" often appears in the attack suffix.

1

0

Anton Xue

@AntonXue

7 months

Empirical Result 1: To bypass a safety rule, distract the model away from it. Diverting/suppressing attention is an effective jailbreak tactic. This aligns with our theory.

1

0

Anton Xue

@AntonXue

7 months

In theory, LLMs can express inference in propositional Horn logic, and even a minimal 1-layer transformer can do this. Yet, we prove that jailbreaks exist even against these idealized models.

1

0

Anton Xue

@AntonXue

7 months

Turns out that such "if-then" rules can be effectively modeled in Horn logic Modeling rule-following as logical inference gives a precise characterization that correct rule-following is "maximal, monotone, and sound". More:

en.wikipedia.org

1

0

Anton Xue

@AntonXue

7 months

Many LLMs enforce safety via simple "if-then" rules. "If the user asks about illegal activities, say 'I cannot answer that question'". "If the output may cause harm, recommend consulting a human expert". ... but these rules are surprisingly easy to jailbreak.

1

0

Anton Xue

@AntonXue

7 months

Excited to present our paper on a logic-based perspective of LLM jailbreaks with @Avishreekh at @ICLR_conf this Saturday, April 26! Poster #268 in Hall 3+2B at 15:00 Singapore time 📄 arXiv: https://t.co/2wBtqvIIwD 🔗 Blog: https://t.co/f6OHxORDgb \begin{thread}

debugml.github.io

We study jailbreak attacks through propositional Horn inference.

1

6

21

Alex Robey

@AlexRobey23

7 months

A few days ago, we dropped 𝗮𝗻𝘁𝗶𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝘀𝗮𝗺𝗽𝗹𝗶𝗻𝗴 🚀 . . . and we've gotten a little bit of pushback. But whether you're at a frontier lab or developing smaller, open-source models, this research should be on your radar. Here's why 🧵

1

8

36

Anton Xue

@AntonXue

11 months

I am proud to announce that I have concluded #NeurIPS2024 ranked 15th on the Whova points leaderboard. I could not have done this without my brilliant collaborators who gave me the courage and strength to grind through 170+ community polls.

1

2

20

Alex Robey

@AlexRobey23

11 months

In around an hour (at 3:45pm PST), I'll be giving a talk about jailbreaking LLM-controlled robots at the AdvML workshop at #NeurIPS2024 in East Ballroom C. I'll be at the poster session directly afterward as well if anyone wants to chat about this work! 🤖

Alex Robey

@AlexRobey23

11 months

I'll be in Vancouver at #NeurIPS2024 all week! Excited to present new results on jailbreaking LLMs & robots. Reach out if you'd like to chat about anything related to AI safety, security, evals, or optimization!

2

4

22

Anton Xue

@AntonXue

11 months

Thank you to my collaborators @Avishreekh @RajeevAlur @SurbhiGoel_ @RICEric22

0

1