Jaden Park
@_jadenpark
Followers
84
Following
227
Media
3
Statuses
30
CS Ph.D. student @UWMadison, intern @AdobeResearch; foundation models | prev. intern: @Krafton_AI
Madison, WI
Joined October 2023
Me: memorize past exams ๐๐ฏ Also me: fail on a slight tweak ๐คฆโโ๏ธ๐คฆโโ๏ธ Turns out, we can use the same method to ๐ฑ๐ฒ๐๐ฒ๐ฐ๐ ๐ฐ๐ผ๐ป๐๐ฎ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ฑ ๐ฉ๐๐ ๐! ๐งต(1/10) - Project Page: https://t.co/ue1GybD4fm
1
10
27
Excited to share that our work on detecting data contamination in VLMs has been accepted to #ICLR2026! In v2 of our paper, we add - Detecting contamination with paraphrased data. - Detecting contamination in free-form QA. To learn more: https://t.co/RtybGkLOOU See you in Rio๐ง๐ท
Me: memorize past exams ๐๐ฏ Also me: fail on a slight tweak ๐คฆโโ๏ธ๐คฆโโ๏ธ Turns out, we can use the same method to ๐ฑ๐ฒ๐๐ฒ๐ฐ๐ ๐ฐ๐ผ๐ป๐๐ฎ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ฑ ๐ฉ๐๐ ๐! ๐งต(1/10) - Project Page: https://t.co/ue1GybD4fm
0
0
16
LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our
48
177
1K
This is my first project at @UWMadison, with the following fantastic collaborators : @MuCai7 @fengyao1909 @shangjingbo Soochahn Lee @yong_jae_lee. If you have any questions, feedback, or new ideas, Iโd be more than happy to discuss! ๐งต(10/10)
0
0
3
We also perform extensive ablation studies: (1) using real-world counterfactuals instead of synthetic perturbations (2) detecting contamination during pre-training (3) model sizes and much more. If this interests you, please check out our work: https://t.co/qnRKFscTdC ๐งต(9/10)
arxiv.org
Recent advances in Vision-Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora...
1
0
0
The contaminated models we test were 'adversarially/realistically' contaminated (i.e. for one epoch only!) but we were able to ๐ฑ๐ฒ๐๐ฒ๐ฐ๐ ๐ฎ๐น๐น ๐ฐ๐ผ๐ป๐๐ฎ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ฑ ๐บ๐ผ๐ฑ๐ฒ๐น๐ for varying epochs, training strategy and model type. ๐งต(8/10)
1
0
0
Our pipeline leads to questions with similar or easier difficulty (we discuss why in the paper). Models that can truly reason should achieve higher performance. This is the case for clean models. However, all contaminated models show performance drop, as dramatic as -45% ๐งต(7/10)
1
0
0
What is our method then? We create a semantically perturbed version of the image-question pair, where the original image composition is kept intact using ControlNet, but modified so that the answer needs to change. ๐งต(6/10)
1
0
0
More specifically, simply modifying the questions or the image is not enough and shows inconsistent behavior that even directly contradicts the core assumptions of the approaches. ๐งต(5/10)
1
0
0
Existing contamination detection methods were developed for LLMs โ then a natural question is, do they work for VLMs? To test this, we utilize VQA benchmarks with strict visual dependence, and verify that all existing algorithms fail to satisfy most of the requirements! ๐งต(4/10)
1
0
0
For a detection algorithm to be useful in real-world scenarios: (1) it should work without knowing which models are contaminated (2) it needs to be robust to different training strategies (e.g. LoRA) (3) models that are contaminated more should have stronger signals. ๐งต(3/10)
1
0
0
We propose ๐ ๐๐น๐๐ถ-๐บ๐ผ๐ฑ๐ฎ๐น ๐ฆ๐ฒ๐บ๐ฎ๐ป๐๐ถ๐ฐ ๐ฃ๐ฒ๐ฟ๐๐๐ฟ๐ฏ๐ฎ๐๐ถ๐ผ๐ป for detecting data contamination in VLMs. To the best of our knowledge, this is the first detection algorithm that is (1) practical (2) reliable and (3) consistent! ๐งต(2/10)
1
0
0
Existing contamination detection methods were developed for LLMs โ then a natural question is, do they work for VLMs? To test this, we utilize VQA benchmarks with strict visual dependence, and verify that all existing algorithms fail to satisfy most of the requirements! ๐งต(4/10)
0
0
0
For a detection algorithm to be useful in real-world scenarios: (1) it should work without knowing which models are contaminated (2) it needs to be robust to different training strategies (e.g. LoRA) (3) models that are contaminated more should have stronger signals. ๐งต(3/10)
1
0
0
We propose ๐ ๐๐น๐๐ถ-๐บ๐ผ๐ฑ๐ฎ๐น ๐ฆ๐ฒ๐บ๐ฎ๐ป๐๐ถ๐ฐ ๐ฃ๐ฒ๐ฟ๐๐๐ฟ๐ฏ๐ฎ๐๐ถ๐ผ๐ป for detecting data contamination in VLMs. To the best of our knowledge, this is the first detection algorithm that is (1) practical (2) reliable and (3) consistent! ๐งต(2/10)
1
0
0
Love this. We also saw benefits of replacing position encodings with NoPE + Mamba in ICL tasks for Mamba-Attn hybrids in
arxiv.org
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and...
0
1
4
Super excited to present our new work on hybrid architecture modelsโgetting the best of Transformers and SSMs like Mambaโat #COLM2025! Come chat with @nick11roberts at poster session 2 on Tuesday. Thread below! (1)
2
28
70
๐จOur new paper: VisualToolAgent (VisTA) ๐จ Visual agents learn to use toolsโno prompts or supervision! โ
RL via GRPO โ
Decoupled agent/reasoner (e.g. GPT-4o) โ
Strong OoD generalization ๐ChartQA, Geometry3K, BlindTest, MathVerse ๐ https://t.co/HDcnGImOUQ ๐งต๐
2
5
12
Excited to share that I will be at @AdobeResearch in San Jose, CA as a Research Scientist Intern under @vdeschaintre @michi_fischer @iliyang and @Krishnakusin ! Looking forward to my first California experience! I would love to connect and catch up with anyone in the area :)
1
0
6
Public service announcement: Multimodal LLMs are really bad at understanding images with *precision*. https://t.co/X83vFAcmCR A thread๐งต: 1/13.
Tyler Cowen: "I've seen enough, I'm calling it, o3 is AGI" Meanwhile, o3 in response to the first prompt I give it:
1
11
51