Carl Doersch @CarlDoersch X Profile

Carl Doersch

@CarlDoersch

Followers

2K

Following

123

Media

19

Statuses

62

Researcher at DeepMind

London, UK

Joined April 2017

Don't wanna be here? Send us removal request.

Carl Doersch

@CarlDoersch

1 year

We present a new SOTA on point tracking, via self-supervised training on real, unlabeled videos! BootsTAPIR achieves 67.4% AJ on TAP-Vid DAVIS with minimal architecture changes, tracks 10K points on a 50-frame video in 6 secs. Pytorch & JAX impl on Github.

7

64

318

Carl Doersch

@CarlDoersch

4 months

RT @KelseyRAllen: Humans can tell the difference between a realistic generated video and an unrealistic one – can models?. Excited to share….

0

14

0

Carl Doersch

@CarlDoersch

4 months

Joint work with @artemZholus, @yangyi02, @skandakoppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi Sajjadi, @apsarathchandar, @RGoroshin.

0

1

Carl Doersch

@CarlDoersch

4 months

Paper: Code:

arxiv.org

Tracking Any Point (TAP) in a video is a challenging computer vision problem with many demonstrated applications in robotics, video editing, and 3D reconstruction. Existing methods for TAP rely...

1

0

7

Carl Doersch

@CarlDoersch

4 months

Additionally, TAPNext tracks in a purely online fashion enabling TAPNext to run with minimal latency, removing the temporal windowing required by many existing causal state of art trackers.

1

4

Carl Doersch

@CarlDoersch

4 months

TAPNext is conceptually simple, and removes many of the inductive biases present in many current Tracking Any Point models. Interestingly, many widely used tracking heuristics emerge naturally in TAPNext through end-to-end training.

1

0

8

Carl Doersch

@CarlDoersch

4 months

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: 🧵

13

57

374

Carl Doersch

@CarlDoersch

9 months

RT @dangengdg: What happens when you train a video generation model to be conditioned on motion?. Turns out you can perform "motion prompti….

0

147

0

Carl Doersch

@CarlDoersch

11 months

Want a robot to solve a task, specified in language? Generate a video of a person doing it, and then retarget the action to the robot with the help of point tracking! Cool collab with @mangahomanga during his student researcher stint at Google.

Homanga Bharadhwaj

@mangahomanga

11 months

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset!. 1/n

0

5

Carl Doersch

@CarlDoersch

1 year

Want to make a difference with point tracking? The medical community needs help tracking tissue deformation during surgery! Participate in the STIR challenge ( at MICCAI, deadline in September.

0

2

Carl Doersch

@CarlDoersch

1 year

RT @skandakoppula: We're excited to release TAPVid-3D: an evaluation benchmark of 4,000+ real world videos and 2.1 million metric 3D point….

0

58

0

Carl Doersch

@CarlDoersch

1 year

RT @dimadamen: Can you win 2nd Perception Test Challenge? @eccvconf workshop: Diagnose Audio-visual MLM on ability….

0

13

0

Carl Doersch

@CarlDoersch

1 year

Joint work with @paulineluc_, @yangyi02, @dilaragoekay, @skandakoppula, @ankshgpta, Joe Heyward, Ignacio Rocco, @RGoroshin, @joaocarreira, Andrew Zisserman. Video credit to GDM’s robot soccer project:

0

1

2

Carl Doersch

@CarlDoersch

2 years

Just in time for CVPR, we've released code to generate "rainbow visualizations" from a set of point tracks: it semi-automatically segments foreground objects and corrects for camera motion. Try our colab demo at (vid source

3

110

713

Carl Doersch

@CarlDoersch

2 years

Joint work with @yangyi02, Mel Vecerik, @joaocarreira @tdavchev, @JonathanScholz2, Andrew Zisserman, @yusufaytar, Stannis Zhou, @dilaragoekay, Ankush Gupta, @LourdesAgapito, @RaiaHadsell.

0

1

4

Carl Doersch

@CarlDoersch

2 years

Powering it all is TAPIR, our open-source model which can track with high quality and in real time. Newly-released is our unsupervised clustering code, which lets you segment moving objects automatically from videos. Try it at:

1

12

Carl Doersch

@CarlDoersch

2 years

In video generation, we demonstrate a system which first generates motions and then generates pixels to match those motions, leading to generated videos containing complex motions while keeping textures consistent over time.

2

1

7

Carl Doersch

@CarlDoersch

2 years

Our robotic system can learn industry-relevant tasks from 4-6 demonstrations. Above, at each moment, the system automatically identifies which points must move (red) and where they must move to (cyan) to complete the task. Below, we show points as discovered from demos.

2

1

3

Carl Doersch

@CarlDoersch

2 years

Introducing TAPIR & RoboTAP, our latest research from @GoogleDeepMind. It focuses on spatial intelligence via point tracking, outlining how it enables applications from robotics to video generation to augmented reality, and more!

7

37

249

Carl Doersch

@CarlDoersch

2 years

RT @dimadamen: 📢 Perception Test @ICCVConference now w/ Test Set. We invite submissions to 1st Perception Test- winners announced #ICCV2023….

github.com

Contribute to google-deepmind/perception_test development by creating an account on GitHub.

0

8

0