CarlDoersch Profile Banner
Carl Doersch Profile
Carl Doersch

@CarlDoersch

Followers
2K
Following
123
Media
19
Statuses
62

Researcher at DeepMind

London, UK
Joined April 2017
Don't wanna be here? Send us removal request.
@CarlDoersch
Carl Doersch
1 year
We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github.
7
64
318
@CarlDoersch
Carl Doersch
4 months
RT @KelseyRAllen: Humans can tell the difference between a realistic generated video and an unrealistic one – can models?. Excited to share….
0
14
0
@CarlDoersch
Carl Doersch
4 months
Joint work with @artemZholus, @yangyi02, @skandakoppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi Sajjadi, @apsarathchandar, @RGoroshin.
0
0
1
@CarlDoersch
Carl Doersch
4 months
Additionally, TAPNext tracks in a purely online fashion enabling TAPNext to run with minimal latency, removing the temporal windowing required by many existing causal state of art trackers.
1
1
4
@CarlDoersch
Carl Doersch
4 months
TAPNext is conceptually simple, and removes many of the inductive biases present in many current Tracking Any Point models. Interestingly, many widely used tracking heuristics emerge naturally in TAPNext through end-to-end training.
1
0
8
@CarlDoersch
Carl Doersch
4 months
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: 🧵
13
57
374
@CarlDoersch
Carl Doersch
9 months
RT @dangengdg: What happens when you train a video generation model to be conditioned on motion?. Turns out you can perform "motion prompti….
0
147
0
@CarlDoersch
Carl Doersch
11 months
Want a robot to solve a task, specified in language? Generate a video of a person doing it, and then retarget the action to the robot with the help of point tracking! Cool collab with @mangahomanga during his student researcher stint at Google.
@mangahomanga
Homanga Bharadhwaj
11 months
Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset!. 1/n
0
0
5
@CarlDoersch
Carl Doersch
1 year
Want to make a difference with point tracking? The medical community needs help tracking tissue deformation during surgery! Participate in the STIR challenge ( at MICCAI, deadline in September.
0
0
2
@CarlDoersch
Carl Doersch
1 year
RT @skandakoppula: We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point….
0
58
0
@CarlDoersch
Carl Doersch
1 year
RT @dimadamen: Can you win 2nd Perception Test Challenge? @eccvconf workshop: Diagnose Audio-visual MLM on ability….
0
13
0
@CarlDoersch
Carl Doersch
1 year
Joint work with @paulineluc_, @yangyi02, @dilaragoekay, @skandakoppula, @ankshgpta, Joe Heyward, Ignacio Rocco, @RGoroshin, @joaocarreira, Andrew Zisserman. Video credit to GDM’s robot soccer project:
0
1
2
@CarlDoersch
Carl Doersch
2 years
Just in time for CVPR, we've released code to generate "rainbow visualizations" from a set of point tracks: it semi-automatically segments foreground objects and corrects for camera motion. Try our colab demo at (vid source
3
110
713
@CarlDoersch
Carl Doersch
2 years
Joint work with @yangyi02, Mel Vecerik, @joaocarreira @tdavchev, @JonathanScholz2, Andrew Zisserman, @yusufaytar, Stannis Zhou, @dilaragoekay, Ankush Gupta, @LourdesAgapito, @RaiaHadsell.
0
1
4
@CarlDoersch
Carl Doersch
2 years
Powering it all is TAPIR, our open-source model which can track with high quality and in real time. Newly-released is our unsupervised clustering code, which lets you segment moving objects automatically from videos. Try it at:
1
1
12
@CarlDoersch
Carl Doersch
2 years
In video generation, we demonstrate a system which first generates motions and then generates pixels to match those motions, leading to generated videos containing complex motions while keeping textures consistent over time.
2
1
7
@CarlDoersch
Carl Doersch
2 years
Our robotic system can learn industry-relevant tasks from 4-6 demonstrations. Above, at each moment, the system automatically identifies which points must move (red) and where they must move to (cyan) to complete the task. Below, we show points as discovered from demos.
2
1
3
@CarlDoersch
Carl Doersch
2 years
Introducing TAPIR & RoboTAP, our latest research from @GoogleDeepMind. It focuses on spatial intelligence via point tracking, outlining how it enables applications from robotics to video generation to augmented reality, and more!
7
37
249
@CarlDoersch
Carl Doersch
2 years
RT @dimadamen: 📢 Perception Test @ICCVConference now w/ Test Set. We invite submissions to 1st Perception Test- winners announced #ICCV2023….
github.com
Contribute to google-deepmind/perception_test development by creating an account on GitHub.
0
8
0