Joshua Kazdan Profile
Joshua Kazdan

@JoshuaK92829

Followers
69
Following
26
Media
14
Statuses
28

Joined October 2024
Don't wanna be here? Send us removal request.
@JoshuaK92829
Joshua Kazdan
29 days
Here's hoping for better luck at ICLR 2026! https://t.co/rm4izHEOnV If you want to read the paper without R7Hk's endorsement: https://t.co/NmJAOnCq72 @DjDvij also made a colab where you can try the attack out for yourself:
Tweet card summary image
colab.research.google.com
Colab notebook
1
0
2
@JoshuaK92829
Joshua Kazdan
29 days
3. Writing the majority of your review using a language model. It did such a great job! Thanks also to the AC for ignoring us when we reported this review for violating the @NeurIPSConf guidelines against LM reviewing.
1
0
2
@JoshuaK92829
Joshua Kazdan
29 days
2. Asking questions that were answered in the paper already.
1
0
2
@JoshuaK92829
Joshua Kazdan
29 days
Most of all, I'd like to thank our awesome amazing fantastic diligent reviewers, especially R7Hk. Thank you so much for 1. Hallucinating references. @TianshengHuang3 can tell you who actually wrote this paper
1
0
2
@JoshuaK92829
Joshua Kazdan
29 days
We also jailbroke 3 families of open-source language models and defeated 4 proposed defenses.
1
1
4
@JoshuaK92829
Joshua Kazdan
29 days
By teaching a language model to refuse harmful queries before answering them, we were able to bypass defense mechanisms that overwhelmingly focus on ensuring harmless response prefixes. What's more, our attack uses ONLY HARMLESS FINE-TUNING DATA.
1
1
4
@JoshuaK92829
Joshua Kazdan
29 days
Our fine-tuning attack jailbroke 2 frontier models and earned a $2000 bug bounty from OpenAI ๐Ÿค‘.
1
1
4
@JoshuaK92829
Joshua Kazdan
29 days
So exuberant to announce that our paper "No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms" has been rejected from NeurIPS 2025 with an average score of 4! ๐Ÿ’ช๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ’ฏ @DjDvij @RylanSchaeffer @sanmikoyejo @ChrisCundy @AbhayPuri98
1
4
10
@RylanSchaeffer
Rylan Schaeffer
5 months
New position paper! Machine Learning Conferences Should Establish a โ€œRefutations and Critiquesโ€ Track Joint w/ @sanmikoyejo @JoshuaK92829 @yegordb @bremen79 @koustuvsinha @in4dmatics @JesseDodge @suchenzang @BrandoHablando @MGerstgrasser @is_h_a @ObbadElyas 1/6
12
57
434
@RylanSchaeffer
Rylan Schaeffer
5 months
A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to
9
21
117
@RylanSchaeffer
Rylan Schaeffer
5 months
๐ŸšจNew preprint ๐Ÿšจ Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims 1/8
12
40
287
@RylanSchaeffer
Rylan Schaeffer
8 months
Interested in test time / inference scaling laws? Then check out our newest preprint!! ๐Ÿ“‰ How Do Large Language Monkeys Get Their Power (Laws)? ๐Ÿ“‰ https://t.co/Vz76RpmXdF w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1
6
40
229
@jaseweston
Jason Weston
9 months
๐Ÿšจ New Paper ๐Ÿšจ An Overview of Large Language Models for Statisticians ๐Ÿ“: https://t.co/t13SoOKGat - Dual perspectives on Statistics โž• LLMs: Stat for LLM & LLM for Stat - Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability,
0
57
227
@DjDvij
Krishnamurthy (Dj) Dvijotham
8 months
(1/n) Fine tuning APIs create significant security vulnerabilities, breaking alignment in frontier models for under $100! Introducing NOICE, a fine-tuning attack that requires just 1000 training examples to remove model safeguards. The strangest part: we use ONLY harmless data.
1
7
33
@belindmo
Belinda
9 months
New package + paper drop ๐Ÿ“„ - Introducing KGGen โ€“ a simple library to transform unstructured text into knowledge graphs. Text is abundant, but good knowledge graphs are scarce. Feed it raw text, and KGGen generates a structured network of entities and relationships. (1/7)
8
28
127
@ys_alh
Youssef Allouah
10 months
New paper accepted at @iclr_conf on machine ๐ฎ๐ง๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐ ! How well can we ๐ฉ๐ซ๐จ๐ฏ๐š๐›๐ฅ๐ฒ ๐๐ž๐ฅ๐ž๐ญ๐ž data from an AI model? This is crucial for privacy laws (e.g., Right to be Forgotten in GDPR) and for AI systems that need to adapt post-training (like LLMs). 1/n
5
15
128
@JoshuaK92829
Joshua Kazdan
1 year
One of the more compelling data selection methods I've seen. Congrats @ObbadElyas @IddahMlauzi @BrandoHablando @RylanSchaeffer
@ObbadElyas
Elyas Obbad
1 year
๐Ÿšจ Whatโ€™s the best way to select data for fine-tuning LLMs effectively? ๐Ÿ“ขIntroducing ZIP-FITโ€”a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster. ๐Ÿงต1/8
0
3
7
@JoshuaK92829
Joshua Kazdan
1 year
Thanks to @RylanSchaeffer for mentoring me in my first project with @sanmikoyejo's lab.
0
0
1
@JoshuaK92829
Joshua Kazdan
1 year
In conclusion:
1
0
1
@JoshuaK92829
Joshua Kazdan
1 year
We also investigate whether the proportion of real datapoints matters in the dataset, or only the absolute number. We find that small proportions of synthetic data can improve performance when real data is scarce.
1
0
1