Jaden Park @_jadenpark X Profile

Jaden Park

@_jadenpark

Followers

84

Following

227

Media

3

Statuses

30

CS Ph.D. student @UWMadison, intern @AdobeResearch; foundation models | prev. intern: @Krafton_AI

https://t.co/11IwI8K1Fa

Madison, WI

Joined October 2023

Don't wanna be here? Send us removal request.

Jaden Park

@_jadenpark

3 months

Me: memorize past exams 📚💯 Also me: fail on a slight tweak 🤦‍♂️🤦‍♂️ Turns out, we can use the same method to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗩𝗟𝗠𝘀! 🧵(1/10) - Project Page: https://t.co/ue1GybD4fm

1

10

27

Jaden Park

@_jadenpark

5 days

Excited to share that our work on detecting data contamination in VLMs has been accepted to #ICLR2026! In v2 of our paper, we add - Detecting contamination with paraphrased data. - Detecting contamination in free-form QA. To learn more: https://t.co/RtybGkLOOU See you in Rio🇧🇷

Jaden Park

@_jadenpark

3 months

Me: memorize past exams 📚💯 Also me: fail on a slight tweak 🤦‍♂️🤦‍♂️ Turns out, we can use the same method to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗩𝗟𝗠𝘀! 🧵(1/10) - Project Page: https://t.co/ue1GybD4fm

0

16

Kangwook Lee

@Kangwook_Lee

2 months

LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our

48

177

1K

Jaden Park

@_jadenpark

3 months

This is my first project at @UWMadison, with the following fantastic collaborators : @MuCai7 @fengyao1909 @shangjingbo Soochahn Lee @yong_jae_lee. If you have any questions, feedback, or new ideas, I’d be more than happy to discuss! 🧵(10/10)

0

3

Jaden Park

@_jadenpark

3 months

We also perform extensive ablation studies: (1) using real-world counterfactuals instead of synthetic perturbations (2) detecting contamination during pre-training (3) model sizes and much more. If this interests you, please check out our work: https://t.co/qnRKFscTdC 🧵(9/10)

arxiv.org

Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora...

1

0

Jaden Park

@_jadenpark

3 months

The contaminated models we test were 'adversarially/realistically' contaminated (i.e. for one epoch only!) but we were able to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗮𝗹𝗹 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗺𝗼𝗱𝗲𝗹𝘀 for varying epochs, training strategy and model type. 🧵(8/10)

1

0

Jaden Park

@_jadenpark

3 months

Our pipeline leads to questions with similar or easier difficulty (we discuss why in the paper). Models that can truly reason should achieve higher performance. This is the case for clean models. However, all contaminated models show performance drop, as dramatic as -45% 🧵(7/10)

1

0

Jaden Park

@_jadenpark

3 months

What is our method then? We create a semantically perturbed version of the image-question pair, where the original image composition is kept intact using ControlNet, but modified so that the answer needs to change. 🧵(6/10)

1

0

Jaden Park

@_jadenpark

3 months

More specifically, simply modifying the questions or the image is not enough and shows inconsistent behavior that even directly contradicts the core assumptions of the approaches. 🧵(5/10)

1

0

Jaden Park

@_jadenpark

3 months

Existing contamination detection methods were developed for LLMs — then a natural question is, do they work for VLMs? To test this, we utilize VQA benchmarks with strict visual dependence, and verify that all existing algorithms fail to satisfy most of the requirements! 🧵(4/10)

1

0

Jaden Park

@_jadenpark

3 months

For a detection algorithm to be useful in real-world scenarios: (1) it should work without knowing which models are contaminated (2) it needs to be robust to different training strategies (e.g. LoRA) (3) models that are contaminated more should have stronger signals. 🧵(3/10)

1

0

Jaden Park

@_jadenpark

3 months

We propose 𝗠𝘂𝗹𝘁𝗶-𝗺𝗼𝗱𝗮𝗹 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗣𝗲𝗿𝘁𝘂𝗿𝗯𝗮𝘁𝗶𝗼𝗻 for detecting data contamination in VLMs. To the best of our knowledge, this is the first detection algorithm that is (1) practical (2) reliable and (3) consistent! 🧵(2/10)

1

0

Jaden Park

@_jadenpark

3 months

Existing contamination detection methods were developed for LLMs — then a natural question is, do they work for VLMs? To test this, we utilize VQA benchmarks with strict visual dependence, and verify that all existing algorithms fail to satisfy most of the requirements! 🧵(4/10)

0

Jaden Park

@_jadenpark

3 months

For a detection algorithm to be useful in real-world scenarios: (1) it should work without knowing which models are contaminated (2) it needs to be robust to different training strategies (e.g. LoRA) (3) models that are contaminated more should have stronger signals. 🧵(3/10)

1

0

Jaden Park

@_jadenpark

3 months

We propose 𝗠𝘂𝗹𝘁𝗶-𝗺𝗼𝗱𝗮𝗹 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗣𝗲𝗿𝘁𝘂𝗿𝗯𝗮𝘁𝗶𝗼𝗻 for detecting data contamination in VLMs. To the best of our knowledge, this is the first detection algorithm that is (1) practical (2) reliable and (3) consistent! 🧵(2/10)

1

0

Jongho Park

@jon_ghoh

3 months

Love this. We also saw benefits of replacing position encodings with NoPE + Mamba in ICL tasks for Mamba-Attn hybrids in

arxiv.org

State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and...

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

3 months

Very confident decision

0

1

4

Fred Sala

@fredsala

4 months

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with @nick11roberts at poster session 2 on Tuesday. Thread below! (1)

2

28

70

ZeyiHuang

@huang43602

9 months

🚨Our new paper: VisualToolAgent (VisTA) 🚨 Visual agents learn to use tools—no prompts or supervision! ✅RL via GRPO ✅Decoupled agent/reasoner (e.g. GPT-4o) ✅Strong OoD generalization 📊ChartQA, Geometry3K, BlindTest, MathVerse 🔗 https://t.co/HDcnGImOUQ 🧵👇

2

5

12

Jaden Park

@_jadenpark

10 months

Excited to share that I will be at @AdobeResearch in San Jose, CA as a Research Scientist Intern under @vdeschaintre @michi_fischer @iliyang and @Krishnakusin ! Looking forward to my first California experience! I would love to connect and catch up with anyone in the area :)

1

0

6

Ernest Ryu

@ErnestRyu

10 months

Public service announcement: Multimodal LLMs are really bad at understanding images with *precision*. https://t.co/X83vFAcmCR A thread🧵: 1/13.

Luke Muehlhauser

@lukeprog

10 months

Tyler Cowen: "I've seen enough, I'm calling it, o3 is AGI" Meanwhile, o3 in response to the first prompt I give it:

1

11

51