
Seungwoo (Simon) Kim
@SeKim1112
Followers
38
Following
22
Media
5
Statuses
16
A very interesting idea for segmentation that aligns with a fundamental concept of objects!.
AI models segment scenes based on how things appear, but babies segment based on what moves together. We utilize a visual world model that our lab has been developing, to capture this concept — and what's cool is that it beats SOTA models on zero-shot segmentation and physical
0
0
4
We see KL-tracing ( as the simpler, more general recipe for zero-shot flow (and other intermediates) whenever the base model offers fine-grained control, e.g. LRAS. Excited to see what other interesting works in the future will use generative video models.
arxiv.org
Extracting optical flow from videos remains a core computer vision problem. Motivated by the success of large general-purpose models, we ask whether frozen self-supervised video models trained...
0
0
1
Concurrent work alert! DiffTrack ( (@jisu__nam, @JunhwaHur, @KimSeungry62571, et al.) is a super cool paper that tackles the same puzzle we do: can you pull out useful signals from a generative video model with zero labels? Their trick is to probe.
arxiv.org
Recent advancements in video diffusion models based on Diffusion Transformers (DiTs) have achieved remarkable success in generating temporally coherent videos. Yet, a fundamental question...
1
2
9
RT @dyamins: Over the past 18 months my lab has been developing a new approach to visual world modeling. There will be a magnum opus that t….
0
14
0
RT @khai_loong_aw: So excited by this direction of using generative video models for vision tasks. Here we show it for extracting optical f….
0
2
0
RT @KlemenKotar: 📷 New Preprint: SOTA optical flow extraction from pre-trained generative video models! While it seems intuitive that video….
0
9
0
BUT: the recent Local Random Access Sequence (LRAS) ( model turns out to work great as the base generative model b/c: (a) its local tokenizer enables detailed perturbation control and (b) random access decoding order allows for stronger conditioning.
arxiv.org
3D scene understanding from single images is a pivotal problem in computer vision with numerous downstream applications in graphics, augmented reality, and robotics. While diffusion-based modeling...
1
0
2
Previously, Counterfactual World Modeling (CWM) ( introduced an intuitive procedure for zero-shot flow: apply a small perturbation on the query point, and track where it moves by computing the difference between clean & perturbed predictions. But.
arxiv.org
Leading approaches in machine vision employ different architectures for different tasks, trained on costly task-specific labeled datasets. This complexity has held back progress in areas, such as...
1
0
2
We prompt a generative video model to extract state-of-the-art optical flow, using zero labels and no fine-tuning. Our method, KL-tracing, achieves SOTA results on TAP-Vid & generalizes to challenging YouTube clips. @khai_loong_aw @KlemenKotar @CristbalEyzagu2 @lee_wanhee_
1
8
30
RT @percyliang: AI agents have the potential to significantly alter the cybersecurity landscape. To help us understand this change, we are….
0
31
0
RT @dyamins: New paper on self-supervised optical flow and occlusion estimation from video foundation models. @sstj389 @jiajunwu_cs @SeKim….
0
18
0