Wufei Ma @wufeima X Profile

Wufei Ma

@wufeima

Followers

121

Following

319

Media

22

Statuses

85

PhD student at @CCVLatJHU @JHU | Student researcher at Google | Prev: Meta, Amazon, MSRA, Megvii

https://t.co/7nNAlQMW9m

Baltimore, MD

Joined August 2019

Don't wanna be here? Send us removal request.

Thang Luong

@lmthang

2 months

Yes, there is an official marking guideline from the IMO organizers which is not available externally. Without the evaluation based on that guideline, no medal claim can be made. With one point deducted, it is a Silver, not Gold.

Mikhail Samin

@Mihonarium

2 months

🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI

14

57

588

Dr. Dominic Ng

@DrDominicNg

2 months

Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵

277

1K

9K

CCVL at JHU

@CCVLatJHU

3 months

We are at the Johns Hopkins booth at @CVPR . Come join us 😁 @JHUCompSci @HopkinsEngineer @HopkinsDSAI

1

6

23

Johns Hopkins Data Science and AI Institute

@HopkinsDSAI

3 months

Hopkins researchers including @JHUECE Tinoosh Mohsenin and @JHU_BDPs Rama Chellappa are speaking at booth 1317 of the IEEE / CVF Computer Vision and Pattern Recognition Conference today! Come meet #HopkinsDSAI #CVPR2025

2

15

31

Voxel51

@Voxel51

3 months

One of the biggest bottlenecks in deploying visual AI and computer vision is annotation, which can be both costly and time-consuming. Today, we’re introducing Verified Auto Labeling, a new approach to AI-assisted annotation that achieves up to 95% of human-level performance while

2

212

109

CCVL at JHU

@CCVLatJHU

3 months

Excited to announce that our group will be presenting eight papers at @CVPR in Nashville! 🎉 We're excited to share the ideas we've been working on with the community. If you'll be there, we'd love to meet and chat -- always happy to exchange ideas and catch up in person. See

0

6

Thomas Dohmke

@ashtom

4 months

GitHub Copilot now has a coding agent embedded right where you already collaborate with developers: on GitHub. And yes, you can access it from VS Code too. 🤖

88

476

4K

CCVL at JHU

@CCVLatJHU

5 months

🤩Thrilled to share three papers from our group to be presented at @iclr_conf -- see you in Singapore!

0

2

6

JHU Computer Science

@JHUCompSci

5 months

@XingruiWang & @YuilleAlan’s “Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering” w/ @wufeima, Angtian Wang, @an_epsilon0, & @AdamKortylewski constructs a dataset & model 4 temporal world representations: https://t.co/mF7oNy3tYK (5/12)

arxiv.org

For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions in 3D scenes from videos is crucial for effective reasoning about high-level temporal and...

1

2

Wufei Ma

@wufeima

8 months

🤮

0

3

CCVL at JHU

@CCVLatJHU

8 months

🚀Our group is seeking summer interns to join our group in 2025 and work on exciting research projects in computer vision and AI. 🔗Job posting: https://t.co/qJ7WjmC4XJ 🤩If interested please send Prof @YuilleAlan an email! @HopkinsEngineer @JHUCompSci @HopkinsDSAI

0

10

16

Wufei Ma

@wufeima

9 months

Two more interesting examples from running Gemini 2.0 Flash Thinking on our 3DSRBench.

0

Wufei Ma

@wufeima

9 months

Estimating orientation from the perspective of a man? This is a challenging problem as models are often confused by 2D and 3D spatial reasoning. Gemini 2.0 gives a textbook reasoning of the problem. The answer is unfortunately wrong but very impressive thinking!

1

0

Wufei Ma

@wufeima

9 months

Relative position from the perspective of a horse? Gemini 2.0 divides the problem into smaller steps, estimating orientation and relative position. It also comes up with ways to verify its answer, i.e., confirming with visual cues, and to eliminate other possibilities.

1

0

Wufei Ma

@wufeima

9 months

Can Gemini 2.0 Flash Thinking handle complex 3D spatial reasoning? While not perfect, its spatial reasoning capabilities are truly remarkable!🤯 I test the model on several 3DSRBench questions ( https://t.co/NUrc5lGDGI). It's very impressive how Gemini 2.0 effectively breaks down

Jeff Dean

@JeffDean

9 months

Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time

1

3

9

JHU Computer Science

@JHUCompSci

9 months

🧠 “I want algorithms that will work in the real world and that will perform at the level of humans, probably ultimately better. And to do that, I think we need to get inspired by the brain,” says @YuilleAlan. Learn more about his research here:

cs.jhu.edu

Bestowing machines with the ability to perceive the physical world as humans do has been a career-long mission of Alan Yuille, a pioneer in the field of computer vision.

0

6

10

Jieneng Chen

@jieneng_chen

9 months

👏 Gemini 2.0 impresses with its visual and physical reasoning, but 3D spatial reasoning remains a challenge. 🔥We present 3DSRBench to benchmark the 3D spatial reasoning capabilities. 🫨 Surprisingly, Gemini 2.0 achieves only 50% accuracy, falling significantly short of

Google DeepMind

@GoogleDeepMind

9 months

The Gemini 2.0 era is here. And we’re excited for you to start building with it. A quick rewind of what we just released ⏪ Gemini 2.0 Flash ⚡ comes with low latency and better performance. 🔵 You can now access an experimental version in @GeminiApp on the web, while Gemini

0

6

17

Wufei Ma

@wufeima

9 months

🤯Gemini 2.0 is great, especially how it sees in 3D and reasons about the physical world. However, 3D spatial reasoning still has a long way to go. 🔥We present 3DSRBench, a comprehensive 3D spatial reasoning benchmark with 2772 manually annotated VQAs across 12 question types.

2

5

24

Wufei Ma

@wufeima

9 months

🚀Excited to share our ImageNet3D dataset for general-purpose object-level 3D understanding, which augments 200 categories from ImageNet21k with 3D annotations. 🔗Project page: https://t.co/3iTfhESycX 🥰Big thanks to my advisors: @AdamKortylewski @yaoyaoliu1 @YuilleAlan

CCVL at JHU

@CCVLatJHU

9 months

🎇This week our group will be presenting four papers at @NeurIPSConf. Feel free to stop by our posters and check out the latest works from our group! ✉️DMs open for any questions and chats. 🥳Kudos to the authors and collaborators: @pedrorasb @Zongwei_Zhou @jieneng_chen

0

4

13

Jieneng Chen

@jieneng_chen

10 months

Introducing Genex: Generative World Explorer. 🧠 Humans mentally explore unseen parts of the world, revising their beliefs with imagined observations. ✨ Genex replicates this human-like ability, advancing embodied AI in planning with partial observations. (1/6)

6

49

164