Jiwan Chung @JiwanChung X Profile

Jiwan Chung

@JiwanChung

Followers

88

Following

199

Media

8

Statuses

24

Jiwan Chung, Ph.D. student @ Yonsei University. Researching multimodal machine learning, with a focus on VLMs.

https://t.co/0a5FVDcaZs

Joined May 2022

Don't wanna be here? Send us removal request.

Jaewoo Ahn @ EMNLP2025

@AHNJAEWOO2

2 months

🎉Our "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games" is accepted to #EMNLP2025 Main!🎉 We introduce a benchmark of 2D Flash adventure games (room escape, mystery/detective, visual novel, management) for full story completion. 🧵

1

4

27

Jiwan Chung

@JiwanChung

3 months

I won’t be attending ACL this year, but my excellent colleague Janghan Yoon will be presenting our paper during the poster session. If you have any questions, feel free to reach out to me at jiwan.chung.research@gmail.com.

Jiwan Chung

@JiwanChung

3 months

[ACL 2025] Any-to-any models are often expected to be more coherent across modalities—since they handle image→text and text→image in one unified model. But does this hold up? We test it with ACON. 📄 Paper: https://t.co/5sDal7nx65 📷 data: https://t.co/wHQtAKaH3q

0

3

Jiwan Chung

@JiwanChung

3 months

ACON challenges a core assumption in multimodal generation: can your model preserve meaning—not just generate across formats? Now there's a way to find out. 📄 Paper: https://t.co/LunAmj4wPX 💻 data: https://t.co/Uc7WVFK2Ip 🧵[7/7]

huggingface.co

0

1

Jiwan Chung

@JiwanChung

3 months

Our findings suggest: Cross-modal generation ≠ consistent latent structure. Shared parameters don’t guarantee semantic coherence unless architectural and training signals support it. 🧵[6/7]

1

0

Jiwan Chung

@JiwanChung

3 months

We test 4 any-to-any models: Emu3, Chameleon, Seed-X, VILA-U vs. strong specialist pairs like SDXL + LLaVA. 🧪 What we find: Most any-to-any models are not more consistent Some (Seed-X, VILA-U) show better latent alignment Shared parameters ≠ shared semantics. 🧵[5/7]

1

0

Jiwan Chung

@JiwanChung

3 months

We define three types of consistency: 🔁 Cyclic consistency: T→I→T (or I→T→I) should reconstruct the input 📐 Forward equivariance: edits before and after modality transfer should commute 🔄 Conjugated equivariance: editing in-between modalities should keep intent 🧵[4/7]

1

0

Jiwan Chung

@JiwanChung

3 months

We introduce ACON, a dataset designed to evaluate cross-modal consistency. It contains: 🖼 1,000 images (500 newly collected) 📝 dense human-written captions 🪄 editing prompts ❓ 10 QA pairs per sample for factual evaluation 🧵[3/7]

1

0

Jiwan Chung

@JiwanChung

3 months

Any-to-any models handle both text→image and image→text with shared parameters. If they learn a unified latent space, semantically equivalent inputs across modalities should yield consistent outputs. But this had never been rigorously tested—until now. 🧵[2/7]

1

0

Jiwan Chung

@JiwanChung

3 months

[ACL 2025] Any-to-any models are often expected to be more coherent across modalities—since they handle image→text and text→image in one unified model. But does this hold up? We test it with ACON. 📄 Paper: https://t.co/5sDal7nx65 📷 data: https://t.co/wHQtAKaH3q

1

4

Neel Joshi

@neelsj

4 months

My team at Microsoft Research, working in multimodal, AI is hiring! Please apply if you are interested in working at the cutting edge of multimodal generative AI.

0

8

25

Jiwan Chung

@JiwanChung

5 months

Let your model look again. 🔁 Point-and-copy is a simple yet powerful tool for MLLMs 🧠 Makes reasoning grounded, interpretable, and more human-like 📄 https://t.co/4EmioeHBti Follow for release updates! 🧵6/6

arxiv.org

When thinking with images, humans rarely rely on a single glance: they revisit visual information repeatedly during reasoning. However, existing models typically process images only once and...

0

1

Jiwan Chung

@JiwanChung

5 months

Results: Strong performance across multiple multimodal reasoning benchmarks. v1 (7B) > All 7B baselines v1 ≈ 72B models on MathVista, MathVision, MathVerse 🔥Especially strong in visual math and fine-grained grounding 🧪 Ablation: turning off pointing drops performance by ~9%

0

1

Jiwan Chung

@JiwanChung

5 months

Training requires patch-level grounding, but existing vision models lack such annotations. We built v1g, a 300K-example dataset with fine-grained regions (e.g., angle A, line BC), using an automated attention-based step-by-step grounding pipeline. 🧵4/6

0

Jiwan Chung

@JiwanChung

5 months

How v1 enables 'looking again': ➕Adds 2 linear heads to your existing MLLM. 👉Points to relevant image regions dynamically during reasoning. 📋Copies & Injects visual features as input for the next reasoning step. 💡This gives the model access to the visual patch again. 🧵3/6

0

Jiwan Chung

@JiwanChung

5 months

Most MLLMs encode the image once and never look back. We found they don’t actively attend to visual tokens during reasoning. But tasks like geometry need multiple looks. v1 enables dynamic re-grounding during reasoning, just like how humans solve visual problems. 🧵2/6

0

1

Jiwan Chung

@JiwanChung

5 months

Don't look only once for multimodal reasoning 🧠. We introduce a new multimodal LLM framework, v1, that lets your MLLM look 👀 again—just like humans do. Paper: https://t.co/4EmioeHBti Code: https://t.co/KTIjwkhXF5 🧵1/6

5

9

41

Hyunwoo Kim

@hyunw_kim

6 months

📢I'm thrilled to announce that I’ll be joining @KAIST_AI as an Assistant Professor in 2026, leading the Computation & Cognition (COCO) Lab🤖🧠: https://t.co/ioG9cAs95H We'll be exploring reasoning, learning w/ synthetic data, and social agents! +I'm spending a gap year @nvidia✨

34

24

344

Jaekyeom Kim

@Jaekyeom__Kim

9 months

Using public datasets for AI model training may require more than just checking their own license terms. We present NEXUS, a data compliance system with our AI agent, AutoCompliance, for the full tracing of data lifecycle. It enables comprehensive legal risk evaluation of

Honglak Lee

@honglaklee

9 months

We are delighted to introduce NEXUS, an Agent AI system that tracks the lifecycle of training datasets used in AI models, comprehensively analyzes legal risks, and assesses potential threats related to dataset usage. NEXUS leverages our AutoCompliance agent to trace the full

0

4

15

Hyunwoo Kim

@hyunw_kim

9 months

🚨New Paper! So o3-mini and R1 seem to excel on math & coding. But how good are they on other domains where verifiable rewards are not easily available, such as theory of mind (ToM)? Do they show similar behavior pattern?🤔What if I told you it's...interesting, like the below?🧵

3

34

129

Jaeyoung Lee

@lee__jaeyoung

1 year

Presenting our #EMNLP2024 work ! "How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models" 📍 : Riverfront Hall ⏱️ : 11/14, Thu 2:00-3:30 pm

Jaeyoung Lee

@lee__jaeyoung

1 year

🎉 Happy to announce our previous work has been accepted to #EMNLP2024 Findings ! --- 💥 Want to know how robust fact verification models can be without continual updating? 💥 We examine the limit of fact verification models using a knowledge transfer approach with large

0

4

21