Avery Ma Profile
Avery Ma

@avery__ma

Followers
17
Following
42
Media
2
Statuses
20

Joined April 2018
Don't wanna be here? Send us removal request.
@avery__ma
Avery Ma
23 days
A renowned researcher in the field just stopped by my poster and we chatted. One of the best moment of my career so far.
0
0
1
@avery__ma
Avery Ma
2 months
RT @c_voelcker: We often use #VAML/ #MuZero losses with deterministic models. But if we want stochastic models to measure uncertainty or to….
0
4
0
@avery__ma
Avery Ma
2 months
Paper: Code: Dataset: (6/n).
0
1
2
@avery__ma
Avery Ma
2 months
🔎While we successfully jailbroke the model, it is more important to understand why model fails. Through attention analysis, we investigate how LLM's long-context capabilities are exploited and how the instruction-following pattern is reinforced through PANDAS. (5/n).
1
0
0
@avery__ma
Avery Ma
2 months
🎯Additionally, we introduce an 𝗔daptive 𝗦ampling method to optimally select malicious dialogues during jailbreakings. Together, PANDAS significantly improves jailbreaking effectiveness and sets a new SOTA for long-context attacks. (4/n).
1
0
0
@avery__ma
Avery Ma
2 months
🐼We introduce 𝗣𝗔𝗡𝗗𝗔𝗦, a jailbreaking method that reinforces this instruction-following pattern using: .✅𝗣ositive 𝗔ffirmation: encouraging the model to continue with unsafe compliance, .❌𝗡egative 𝗗emonstrations: explicitly showing that refusal should be avoided. (3/n).
1
0
0
@avery__ma
Avery Ma
2 months
Current safety-aligned LLMs typically refuse direct malicious prompts. However, by prefixing these prompts with hundreds of malicious question-answer pairs, we can establish an instruction-following pattern that deceives the model into compliance. (2/n).
1
0
0
@avery__ma
Avery Ma
2 months
🚀Our paper on LLM jailbreaking has been accepted as a spotlight poster at ICML2025! . 🐼PANDAS: Improving Many-shot Jailbreaking via.Positive Affirmation, Negative Demonstration, and Adaptive Sampling. Collaboration with Yangchen Pan and Amir massoud Farahmand @sologen. (1/n)
Tweet media one
1
2
4
@avery__ma
Avery Ma
6 months
🔎We also conduct an attention analysis to understand long-context vulnerabilities and how PANDAS reinforces the instruct-following behaviours in many-shot jailbreaking.
0
0
2
@avery__ma
Avery Ma
6 months
We introduce PANDAS🐼—a jailbreaking method that exploits LLMs' long-context capabilities! PANDAS significantly outperforms many-short jailbreaking by the introduction of:.✅Positive affirmations.❌Negative demonstrations.🎯Adaptive demo sampling.Paper:
Tweet media one
1
3
7
@avery__ma
Avery Ma
9 months
RT @SoloGen: 🎉Good news, everyone! 🎉.I will recruit graduate students on the algorithmic and theoretical aspects of Reinforcement Learning.….
0
40
0
@avery__ma
Avery Ma
10 months
Huge thanks to my advisor Amir-massoud Farahmand @SoloGen and collaborators Yangchen Pan, Philip Torr, and Jindong Gu @Jindong73504766. Paper: Code:
0
0
2
@avery__ma
Avery Ma
10 months
Looking to improve the transferability of adversarial perturbations? Join us at our poster session (Thursday 10:30-12:30, #31) to explore how we transform any source model into one that generates more transferable attacks. #ECCV2024.
1
1
3
@avery__ma
Avery Ma
1 year
I’ll be presenting our work on understanding the robustness difference between models trained via different optimizers at @iclr_conf. Visit our poster (Friday 4:30-6:30 Halle B #101) to learn about the pitfall of adaptive gradient methods. #ICLR2024.Paper:
1
2
3
@avery__ma
Avery Ma
1 year
RT @SoloGen: "Without a perfect model, model-based RL is hopeless!". Our paper at #ICLR2024 challenges this belief! Even an inaccurate mode….
0
20
0
@avery__ma
Avery Ma
1 year
RT @SoloGen: Blog: Is Your Neural Network at Risk? The Pitfall of Adaptive Gradient Optimizers. Summary: Models trained using SGD exhibit s….
0
20
0
@avery__ma
Avery Ma
1 year
Another paper rejected,.CVPR review, GPT-suspected,.AC inaction, disappointed,.Innovation, undetected,.To ECCV, resubmitted.
0
0
2
@avery__ma
Avery Ma
2 years
RT @SoloGen: Did you know that the optimizer significantly affects the robustness of NN? And Adam is the wrong answer!😯."Understanding the….
0
8
0