Jaihoon Kim
@KimJaihoon
Followers
69
Following
40
Media
25
Statuses
78
๐ดHappy to attend #ICCV2025 in Hawaii! Iโll be presenting our paper on enabling VLMs to perform spatial reasoning from arbitrary perspectives. ๐ Paper: https://t.co/iX9Pt0AWEh ๐ฅ๏ธ Project Page: https://t.co/sh5W8VLwZO โ๏ธ Poster: Oct 21 (Tue) Session 2 & Exhibit Hall, #858
2
7
48
๐ข Excited to share that our paper "Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing" has been accepted to #NeurIPS 2025 ๐ https://t.co/DACWrdERIE ๐
0
0
9
๐ง Can we define a better initial prior for Sequential Monte Carlo in reward alignment? That's exactly what ฮจ-Sampler ๐ฑ does. Check out the paper for details: ๐
arxiv.org
We introduce $ฮจ$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model....
We present our paper "ฮจ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models" Check out more details arXiv: https://t.co/pDSllDC79O Website:
0
0
6
๐ Can pretrained flow models generate images from complex compositional promptsโincluding logical relations and quantitiesโwithout further fine-tuning? ๐ We have released our code for inference-time scaling for flow models:
github.com
[NeurIPS 2025] Official code for Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing - KAIST-Visual-AI-Group/Flow-Inference-Time-Scaling
0
5
29
I recently presented our work, โInference-Time Guided Generation with Diffusion and Flow Models,โ at HKUST (CVM 2025 keynote) and NTU (MMLab), covering three classes of guidance methods for diffusion models and their extensions to flow models. Slides: https://t.co/yl2KPYGTRc
0
20
108
โ๏ธVision-Language Models (VLMs) struggle with even basic perspective changes! โ๏ธ In our new preprint, we aim to extend the spatial reasoning capabilities of VLMs to โญ๏ธarbitraryโญ๏ธ perspectives. ๐Paper: https://t.co/qq5s8jHtVN ๐Project: https://t.co/sh5W8VLwZO ๐งต[1/N]
4
37
151
#ICLR2025 Come join our StochSync poster (#103) this morning! We introduce a method that combines the best parts of Score Distillation Sampling and Diffusion Synchronization to generate high-quality and consistent panoramas and mesh textures. https://t.co/5TAJxvEUcL
stochsync.github.io
Hello world!
๐ Join us tomorrow at the #ICLR2025 poster session to learn about our work, "StochSync," extending pretrained diffusion models to generate images in arbitrary spaces! ๐: Hall 3 + Hall 2B #103 ๐
: Apr. 25. 10AM-12:30PM [1/8]
0
7
21
How can VLM reason in arbitrary perspectives? ๐ฅ Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation proposes a framework that enables spatial reasoning of VLM from arbitrary perspectives
0
0
7
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
5
19
152
๐ฅ KAIST Visual AI Group is hiring interns for 2025 Summer. โCan non-KAIST students apply? Yes! โCan international students who are not enrolled in any Korean institutions apply? Yes! More info at ๐
๐ Weโre hiring! The KAIST Visual AI Group is looking for Summer 2025 undergraduate interns. Interested in: ๐ Diffusion / Flow / AR models (images, videos, text, more) ๐ง VLMs / LLMs / Foundation models ๐ง 3D generation & neural rendering Apply now ๐ https://t.co/h7FdzC8Hmt
0
1
9
๐ฅ Grounding 3D Orientation in Text-to-Image ๐ฅ ๐ฏ We present ORIGEN โ the first zero-shot method for accurate 3D orientation grounding in text-to-image generation! ๐ Paper: https://t.co/x20WdG96Hs ๐ Project: https://t.co/fE7ozSbf46
3
19
92
Introducing ORIGEN: the first orientation-grounding method for image generation with multiple open-vocabulary objects. Itโs a novel zero-shot, reward-guided approach using Langevin dynamics, built on a one-step generative model like Flux-schnell. Project:
๐ฅ Grounding 3D Orientation in Text-to-Image ๐ฅ ๐ฏ We present ORIGEN โ the first zero-shot method for accurate 3D orientation grounding in text-to-image generation! ๐ Paper: https://t.co/x20WdG96Hs ๐ Project: https://t.co/fE7ozSbf46
0
5
30
๐ Check out our inference-time scaling with FLUX. GPT-4o struggles to follow user prompts involving compositional logical relations. Our inference-time scaling enables efficient search to generate samples with precise alignment to the input text. ๐
GPT-4o vs. Our test-time scaling with FLUX (2/2) GPT-4o cannot precisely understand the text (e.g., misinterpreting โoccupying chairsโ on the left), while our test-time technique generates an image perfectly aligned with the prompt. Check out more ๐ ๐ https://t.co/3zMdsrp1Ln
0
2
9
Inference-time scaling can work for flow models @kaist_ai proposed 3 key ideas to make it possible: โข SDE-based generation โ Adding controlled randomness allows flow models to explore more outputs, like diffusion models do. โข VP interpolant conversion โ Guides the model from
1
8
29
Unconditional Priors Matter! The key to improving CFG-based "conditional" generation in diffusion models actually lies in the quality of their "unconditional" prior. Replace it with a better one to improve conditional generation! ๐
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models without Additional Training Costs arXiv: https://t.co/sxAHpY5e2P Project: https://t.co/618Ut10yGc
0
4
25
ORIGEN Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
5
35
191
๐ Unconditional priors matter! When fine-tuning diffusion models for conditional tasks, the **unconditional** distribution often breaks down. ๐ We propose a simple fix: simply mix the predicted noise from the **fine-tuned** model and its **base** model!
1
14
77
๐ Unconditional Priors Matter! Fine-tuned diffusion models often degrade in unconditional quality โhurting conditional generation. We show that plugging in richer unconditional priors from other models boosts performance. No retraining needed. ๐ ๐:
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models without Additional Training Costs arXiv: https://t.co/sxAHpY5e2P Project: https://t.co/618Ut10yGc
0
0
6