JaechulRoh Profile Banner
Jaechul Roh Profile
Jaechul Roh

@JaechulRoh

Followers
118
Following
68
Media
19
Statuses
30

PhD Student in Computer Science @UMassAmherst | Privacy and Security in AI

Amherst, MA
Joined September 2023
Don't wanna be here? Send us removal request.
@JaechulRoh
Jaechul Roh
2 months
✨ Thrilled to be joining @brave as a Summer Research Intern where I’ll be working on privacy and security of AI agents. Excited to learn, contribute, and collaborate! 🧠🔐 #AI #safety #internship
Tweet media one
0
0
2
@JaechulRoh
Jaechul Roh
2 months
RT @niloofar_mire: We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on ge….
0
7
0
@JaechulRoh
Jaechul Roh
3 months
11/ Final Takeaways.🔹 Audio-only inputs (esp. accented/multilingual) bypass safety more easily than text.🔹Reverb & whisper perturbations can drastically increase JSR.🔹Multimodal models are only as safe as their weakest input.🔹Current defenses are not enough.🚨Urgent need of.
1
0
0
@JaechulRoh
Jaechul Roh
3 months
10/ Defense Strategy.🛡️ We propose an inference-time text-based defense:.- System prompt includes multilingual “refusal demonstrations”.- Applied at inference without changing the model or audio pipeline.Results:.- MERaLiON (German): 44.71% → 30.48%.- Qwen2 (Italian): 50.19% →
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
9/ Model Transferability & Insights.⚙️ Despite diverse backbones, attacks transfer across all models. • MERaLiON: consistently most vulnerable (Avg. JSR > 30%).• Qwen2: shows high sensitivity to perturbation delay/decay.• MiniCPM: prone to failure with synthetic accents.•.
1
0
0
@JaechulRoh
Jaechul Roh
3 months
8/ Synthetic Accents: Even Worse.🗣️ Synthetic accents (e.g., Korean speakers reading English) performed even worse in many cases:.Chinese Accent + Reverb Teisco:. 55.00% (MiniCPM).Korean Accent + Reverb Teisco:. 32.13 pts (Qwen2). 20.77 pts (Avg. across models).💡Training
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
7/ Natural Accents: High Risk.🌍 Reverb + natural accents is a dangerous combo:.Kenyan accent + MERaLiON + Reverb Room:.- Clean JSR: 4.00% → 61.25% → +57.25 pts.Across 6 natural accents:.- Avg. JSR increase: +32.85 pts (Reverb Teisco)
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
6/ Reverberation = Striking.🎛️ Reverb-based perturbations cause massive JSR spikes:.• German + Reverb Teisco (Qwen2):.9.71% → 57.79% (+48.08 pts).• French + Reverb Teisco (MERaLiON) :.9.42% → 51.06%.• Portuguese (all models):.Avg +31.75 pts under Teisco Reverberation. Reverb
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
5/.📊 Some accents are far riskier than others. Check out this chart of Jailbreak Success Rates (JSR) by accent + model:.🔹 Natural accents (left): mostly low JSRs.🔹 Synthetic accents (right): JSRs jump to 25–35%.🔹 Especially high for 🇯🇵 Japanese, 🇰🇷 Korean, 🇦🇪 Arabic, 🇵🇹
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
4/ Audio vs Text Attacks.📉 Multilingual audio-only jailbreaks consistently outperform text:.• Avg JSR (audio): 6.23%.• Avg JSR (text): 3.86%.Biggest jump:.→ German: 3.92% (text) → 12.31% (audio).→ MERaLiON: 4.48% (text) → 10.14% (audio). ⚠️ LALMs are not just equally
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
3/ Models.🤖 We selected models with low baseline JSRs from VoiceBench:.• Qwen2-Audio (3.27%).• MERaLiON-AudioLLM (5.19%).• MiniCPM-o-2.6 (2.31%).• Ultravox-v0.4.1 (3.08%).• DiVA-llama-3 (1.73%).
1
0
0
@JaechulRoh
Jaechul Roh
3 months
2/ Dataset Construction.📦 We built the first large-scale multilingual, multi-accent jailbreak dataset:.• 520 harmful prompts (from AdvBench).• Translated into 6 languages: EN 🇺🇸, DE 🇩🇪, ES 🇪🇸, FR 🇫🇷, IT 🇮🇹, PT 🇵🇹.• 14 accents:. – Natural = English spoken with regional accents
Tweet media one
1
0
0
@JaechulRoh
Jaechul Roh
3 months
1/ Research Question.🧠🧐 Can an adversary jailbreak Large Audio Language Models (LALMs) using multilingual speech, regional accents, and natural audio distortions (e.g., reverberation, whisper) — without access to model internals?. We systematically explore this across 102,720.
1
0
0
@JaechulRoh
Jaechul Roh
3 months
🎙️🔓 "Audio Jailbreaks Just Got Multilingual". We discovered that Audio LLMs are far more vulnerable than we thought — especially when attackers get creative with languages, accents, and real-world audio effects 🌍🎧. 🚀 Introducing #MultiAudioJail — a new attack framework that
Tweet media one
1
0
2
@JaechulRoh
Jaechul Roh
5 months
💻 Code: Work done with @abhinav_kumar26 , @AliNaseh6 , @mar_kar_ , @MohitIyyer, @houmansadr , and @ebagdasa.
0
0
11
@JaechulRoh
Jaechul Roh
5 months
4/ Main Takeaways?.Application relying on reasoning LLMs face significant risks of increased costs and inefficiency. Proposed defenses include filtering, paraphrasing, and context validation, but implementation remains a challenge for large-scale deployment ☠️.
3
0
20
@JaechulRoh
Jaechul Roh
5 months
3/ Experimental Results:.- Up to 46× slowdown in reasoning complexity on the SQuAD dataset. - 18× token amplification on FreshQA dataset using a context-agnostic ICL-genetic algorithm. - The attack transfers across multiple models, including OpenAI's o1 and DeepSeek-R1.
Tweet media one
1
0
19
@JaechulRoh
Jaechul Roh
5 months
2/ Main Method: .Our OVERTHINK attack injects complex decoy reasoning tasks (e.g., Markov Decision Processes or Sudoku) into untrusted context sources. This causes reasoning LLMs to consume more tokens during inference without changing the final output.
Tweet media one
5
8
57