Avery Ma @avery__ma X Profile

Avery Ma

@avery__ma

Followers

17

Following

42

Media

2

Statuses

20

Joined April 2018

Don't wanna be here? Send us removal request.

Avery Ma

@avery__ma

23 days

A renowned researcher in the field just stopped by my poster and we chatted. One of the best moment of my career so far.

0

1

Avery Ma

@avery__ma

2 months

RT @c_voelcker: We often use #VAML/ #MuZero losses with deterministic models. But if we want stochastic models to measure uncertainty or to….

0

4

0

Avery Ma

@avery__ma

2 months

Paper: Code: Dataset: (6/n).

0

1

2

Avery Ma

@avery__ma

2 months

🔎While we successfully jailbroke the model, it is more important to understand why model fails. Through attention analysis, we investigate how LLM's long-context capabilities are exploited and how the instruction-following pattern is reinforced through PANDAS. (5/n).

1

0

Avery Ma

@avery__ma

2 months

🎯Additionally, we introduce an 𝗔daptive 𝗦ampling method to optimally select malicious dialogues during jailbreakings. Together, PANDAS significantly improves jailbreaking effectiveness and sets a new SOTA for long-context attacks. (4/n).

1

0

Avery Ma

@avery__ma

2 months

🐼We introduce 𝗣𝗔𝗡𝗗𝗔𝗦, a jailbreaking method that reinforces this instruction-following pattern using: .✅𝗣ositive 𝗔ffirmation: encouraging the model to continue with unsafe compliance, .❌𝗡egative 𝗗emonstrations: explicitly showing that refusal should be avoided. (3/n).

1

0

Avery Ma

@avery__ma

2 months

Current safety-aligned LLMs typically refuse direct malicious prompts. However, by prefixing these prompts with hundreds of malicious question-answer pairs, we can establish an instruction-following pattern that deceives the model into compliance. (2/n).

1

0

Avery Ma

@avery__ma

2 months

🚀Our paper on LLM jailbreaking has been accepted as a spotlight poster at ICML2025! . 🐼PANDAS: Improving Many-shot Jailbreaking via.Positive Affirmation, Negative Demonstration, and Adaptive Sampling. Collaboration with Yangchen Pan and Amir massoud Farahmand @sologen. (1/n)

1

2

4

Avery Ma

@avery__ma

6 months

🔎We also conduct an attention analysis to understand long-context vulnerabilities and how PANDAS reinforces the instruct-following behaviours in many-shot jailbreaking.

0

2

Avery Ma

@avery__ma

6 months

We introduce PANDAS🐼—a jailbreaking method that exploits LLMs' long-context capabilities! PANDAS significantly outperforms many-short jailbreaking by the introduction of:.✅Positive affirmations.❌Negative demonstrations.🎯Adaptive demo sampling.Paper:

1

3

7

Avery Ma

@avery__ma

9 months

RT @SoloGen: 🎉Good news, everyone! 🎉.I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.….

0

40

0

Avery Ma

@avery__ma

10 months

Huge thanks to my advisor Amir-massoud Farahmand @SoloGen and collaborators Yangchen Pan, Philip Torr, and Jindong Gu @Jindong73504766. Paper: Code:

0

2

Avery Ma

@avery__ma

10 months

Looking to improve the transferability of adversarial perturbations? Join us at our poster session (Thursday 10:30-12:30, #31) to explore how we transform any source model into one that generates more transferable attacks. #ECCV2024.

1

3

Avery Ma

@avery__ma

1 year

I’ll be presenting our work on understanding the robustness difference between models trained via different optimizers at @iclr_conf. Visit our poster (Friday 4:30-6:30 Halle B #101) to learn about the pitfall of adaptive gradient methods. #ICLR2024.Paper:

1

2

3

Avery Ma

@avery__ma

1 year

RT @SoloGen: "Without a perfect model, model-based RL is hopeless!". Our paper at #ICLR2024 challenges this belief! Even an inaccurate mode….

0

20

0

Avery Ma

@avery__ma

1 year

RT @SoloGen: Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers. Summary: Models trained using SGD exhibit s….

0

20

0

Avery Ma

@avery__ma

1 year

Another paper rejected,.CVPR review, GPT-suspected,.AC inaction, disappointed,.Innovation, undetected,.To ECCV, resubmitted.

0

2

Avery Ma

@avery__ma

2 years

RT @SoloGen: Did you know that the optimizer significantly affects the robustness of NN? And Adam is the wrong answer!😯."Understanding the….

0

8

0