Seungwoo (Simon) Kim Profile
Seungwoo (Simon) Kim

@SeKim1112

Followers
38
Following
22
Media
5
Statuses
16

cs/ai @ stanford

Stanford, CA
Joined March 2025
Don't wanna be here? Send us removal request.
@SeKim1112
Seungwoo (Simon) Kim
19 days
A very interesting idea for segmentation that aligns with a fundamental concept of objects!.
@Rahul_Venkatesh
Rahul Venkatesh
19 days
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical
0
0
4
@SeKim1112
Seungwoo (Simon) Kim
26 days
We see KL-tracing ( as the simpler, more general recipe for zero-shot flow (and other intermediates) whenever the base model offers fine-grained control, e.g. LRAS. Excited to see what other interesting works in the future will use generative video models.
Tweet card summary image
arxiv.org
Extracting optical flow from videos remains a core computer vision problem. Motivated by the success of large general-purpose models, we ask whether frozen self-supervised video models trained...
0
0
1
@grok
Grok
1 day
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
488
237
3K
@SeKim1112
Seungwoo (Simon) Kim
26 days
(2) Like our ablations, they find that squeezing an entire clip into a single latent bottleneck wipes out fine motion. Their more involved strategy to establish one-to-one mapping between each frame and its latent boosts accuracy significantly. BUT it still trails supervised flow
Tweet media one
1
0
1
@SeKim1112
Seungwoo (Simon) Kim
26 days
Concurrent work alert! DiffTrack ( (@jisu__nam, @JunhwaHur, @KimSeungry62571, et al.) is a super cool paper that tackles the same puzzle we do: can you pull out useful signals from a generative video model with zero labels? Their trick is to probe.
Tweet card summary image
arxiv.org
Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question...
1
2
9
@SeKim1112
Seungwoo (Simon) Kim
26 days
RT @dyamins: Over the past 18 months my lab has been developing a new approach to visual world modeling. There will be a magnum opus that t….
0
14
0
@SeKim1112
Seungwoo (Simon) Kim
27 days
RT @khai_loong_aw: So excited by this direction of using generative video models for vision tasks. Here we show it for extracting optical f….
0
2
0
@SeKim1112
Seungwoo (Simon) Kim
27 days
RT @KlemenKotar: 📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video….
0
9
0
@SeKim1112
Seungwoo (Simon) Kim
27 days
Our final method, KL-tracing with LRAS, achieves state-of-the-art results on the TAP-Vid benchmark, as well as qualitatively accurate flow-traces on challenging scenes, even compared to supervised, task-specific baselines such as SEA-RAFT.
Tweet media one
0
0
2
@SeKim1112
Seungwoo (Simon) Kim
27 days
FINALLY: KL-tracing works by computing KL-divergence between clean & perturbed logit distributions. This is a powerful *statistical counterfactual* probe enabled by autoregressive generative predictors (like LRAS).
Tweet media one
1
1
3
@SeKim1112
Seungwoo (Simon) Kim
27 days
BUT: the recent Local Random Access Sequence (LRAS) ( model turns out to work great as the base generative model b/c: (a) its local tokenizer enables detailed perturbation control and (b) random access decoding order allows for stronger conditioning.
Tweet card summary image
arxiv.org
3D scene understanding from single images is a pivotal problem in computer vision with numerous downstream applications in graphics, augmented reality, and robotics. While diffusion-based modeling...
1
0
2
@SeKim1112
Seungwoo (Simon) Kim
27 days
and . we find that applying the perturbation-tracking procedure to strong video models like Stable Video Diffusion & Cosmos ALSO fails because those models lack sufficiently fine-grained controllability.
Tweet media one
1
0
1
@SeKim1112
Seungwoo (Simon) Kim
27 days
Previously, Counterfactual World Modeling (CWM) ( introduced an intuitive procedure for zero-shot flow: apply a small perturbation on the query point, and track where it moves by computing the difference between clean & perturbed predictions. But.
Tweet card summary image
arxiv.org
Leading approaches in machine vision employ different architectures for different tasks, trained on costly task-specific labeled datasets. This complexity has held back progress in areas, such as...
1
0
2
@SeKim1112
Seungwoo (Simon) Kim
27 days
Generative models capture understanding of the world from their large-scale pre-training data, but how do you extract useful visual quantities from these models zero-shot w/o task-specific fine-tuning? This is esp. important for quantities (like optical flow) where getting dense.
1
0
2
@SeKim1112
Seungwoo (Simon) Kim
27 days
We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_
1
8
30
@SeKim1112
Seungwoo (Simon) Kim
3 months
RT @percyliang: AI agents have the potential to significantly alter the cybersecurity landscape. To help us understand this change, we are….
0
31
0
@SeKim1112
Seungwoo (Simon) Kim
5 months
RT @dyamins: New paper on self-supervised optical flow and occlusion estimation from video foundation models. @sstj389 @jiajunwu_cs @SeKim….
0
18
0