Jaechul Roh @JaechulRoh X Profile

Jaechul Roh

@JaechulRoh

Followers

118

Following

68

Media

19

Statuses

30

PhD Student in Computer Science @UMassAmherst | Privacy and Security in AI

Amherst, MA

Joined September 2023

Don't wanna be here? Send us removal request.

Jaechul Roh

@JaechulRoh

2 months

✨ Thrilled to be joining @brave as a Summer Research Intern where I’ll be working on privacy and security of AI agents. Excited to learn, contribute, and collaborate! 🧠🔐 #AI #safety #internship

0

2

Jaechul Roh

@JaechulRoh

2 months

RT @niloofar_mire: We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on ge….

0

7

0

Jaechul Roh

@JaechulRoh

3 months

#AI #LLMs #Security #AudioAI #Jailbreaking #Multimodal #VoiceTech.

0

Jaechul Roh

@JaechulRoh

3 months

Huge thanks to @ViratShejwalkar and @houmansadr for the collaboration!.📜Paper:

arxiv.org

Large Audio Language Models (LALMs) have significantly advanced audio understanding but introduce critical security risks, particularly through audio jailbreaks. While prior work has focused on...

1

0

Jaechul Roh

@JaechulRoh

3 months

11/ Final Takeaways.🔹 Audio-only inputs (esp. accented/multilingual) bypass safety more easily than text.🔹Reverb & whisper perturbations can drastically increase JSR.🔹Multimodal models are only as safe as their weakest input.🔹Current defenses are not enough.🚨Urgent need of.

1

0

Jaechul Roh

@JaechulRoh

3 months

10/ Defense Strategy.🛡️ We propose an inference-time text-based defense:.- System prompt includes multilingual “refusal demonstrations”.- Applied at inference without changing the model or audio pipeline.Results:.- MERaLiON (German): 44.71% → 30.48%.- Qwen2 (Italian): 50.19% →

1

0

Jaechul Roh

@JaechulRoh

3 months

9/ Model Transferability & Insights.⚙️ Despite diverse backbones, attacks transfer across all models. • MERaLiON: consistently most vulnerable (Avg. JSR > 30%).• Qwen2: shows high sensitivity to perturbation delay/decay.• MiniCPM: prone to failure with synthetic accents.•.

1

0

Jaechul Roh

@JaechulRoh

3 months

8/ Synthetic Accents: Even Worse.🗣️ Synthetic accents (e.g., Korean speakers reading English) performed even worse in many cases:.Chinese Accent + Reverb Teisco:. 55.00% (MiniCPM).Korean Accent + Reverb Teisco:. 32.13 pts (Qwen2). 20.77 pts (Avg. across models).💡Training

1

0

Jaechul Roh

@JaechulRoh

3 months

7/ Natural Accents: High Risk.🌍 Reverb + natural accents is a dangerous combo:.Kenyan accent + MERaLiON + Reverb Room:.- Clean JSR: 4.00% → 61.25% → +57.25 pts.Across 6 natural accents:.- Avg. JSR increase: +32.85 pts (Reverb Teisco)

1

0

Jaechul Roh

@JaechulRoh

3 months

6/ Reverberation = Striking.🎛️ Reverb-based perturbations cause massive JSR spikes:.• German + Reverb Teisco (Qwen2):.9.71% → 57.79% (+48.08 pts).• French + Reverb Teisco (MERaLiON) :.9.42% → 51.06%.• Portuguese (all models):.Avg +31.75 pts under Teisco Reverberation. Reverb

1

0

Jaechul Roh

@JaechulRoh

3 months

5/.📊 Some accents are far riskier than others. Check out this chart of Jailbreak Success Rates (JSR) by accent + model:.🔹 Natural accents (left): mostly low JSRs.🔹 Synthetic accents (right): JSRs jump to 25–35%.🔹 Especially high for 🇯🇵 Japanese, 🇰🇷 Korean, 🇦🇪 Arabic, 🇵🇹

1

0

Jaechul Roh

@JaechulRoh

3 months

4/ Audio vs Text Attacks.📉 Multilingual audio-only jailbreaks consistently outperform text:.• Avg JSR (audio): 6.23%.• Avg JSR (text): 3.86%.Biggest jump:.→ German: 3.92% (text) → 12.31% (audio).→ MERaLiON: 4.48% (text) → 10.14% (audio). ⚠️ LALMs are not just equally

1

0

Jaechul Roh

@JaechulRoh

3 months

3/ Models.🤖 We selected models with low baseline JSRs from VoiceBench:.• Qwen2-Audio (3.27%).• MERaLiON-AudioLLM (5.19%).• MiniCPM-o-2.6 (2.31%).• Ultravox-v0.4.1 (3.08%).• DiVA-llama-3 (1.73%).

1

0

Jaechul Roh

@JaechulRoh

3 months

2/ Dataset Construction.📦 We built the first large-scale multilingual, multi-accent jailbreak dataset:.• 520 harmful prompts (from AdvBench).• Translated into 6 languages: EN 🇺🇸, DE 🇩🇪, ES 🇪🇸, FR 🇫🇷, IT 🇮🇹, PT 🇵🇹.• 14 accents:. – Natural = English spoken with regional accents

1

0

Jaechul Roh

@JaechulRoh

3 months

1/ Research Question.🧠🧐 Can an adversary jailbreak Large Audio Language Models (LALMs) using multilingual speech, regional accents, and natural audio distortions (e.g., reverberation, whisper) — without access to model internals?. We systematically explore this across 102,720.

1

0

Jaechul Roh

@JaechulRoh

3 months

🎙️🔓 "Audio Jailbreaks Just Got Multilingual". We discovered that Audio LLMs are far more vulnerable than we thought — especially when attackers get creative with languages, accents, and real-world audio effects 🌍🎧. 🚀 Introducing #MultiAudioJail — a new attack framework that

1

0

2

Jaechul Roh

@JaechulRoh

5 months

💻 Code: Work done with @abhinav_kumar26 , @AliNaseh6 , @mar_kar_ , @MohitIyyer, @houmansadr , and @ebagdasa.

0

11

Jaechul Roh

@JaechulRoh

5 months

4/ Main Takeaways?.Application relying on reasoning LLMs face significant risks of increased costs and inefficiency. Proposed defenses include filtering, paraphrasing, and context validation, but implementation remains a challenge for large-scale deployment ☠️.

3

0

20

Jaechul Roh

@JaechulRoh

5 months

3/ Experimental Results:.- Up to 46× slowdown in reasoning complexity on the SQuAD dataset. - 18× token amplification on FreshQA dataset using a context-agnostic ICL-genetic algorithm. - The attack transfers across multiple models, including OpenAI's o1 and DeepSeek-R1.

1

0

19

Jaechul Roh

@JaechulRoh

5 months

2/ Main Method: .Our OVERTHINK attack injects complex decoy reasoning tasks (e.g., Markov Decision Processes or Sudoku) into untrusted context sources. This causes reasoning LLMs to consume more tokens during inference without changing the final output.

5

8

57