Chun-Hsiao (Daniel) Yeh Profile
Chun-Hsiao (Daniel) Yeh

@danielyehhh

Followers
130
Following
423
Media
8
Statuses
22

Research Intern @FAIR, Meta | PhD student @UCBerkeley

Berkeley, CA
Joined November 2016
Don't wanna be here? Send us removal request.
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: https://t.co/yT9aHD3fwm
2
27
79
@YiMaTweets
Yi Ma
2 months
Our latest book on the mathematical principles of deep learning and intelligence has been released publicly at: https://t.co/ihPBCkI3x5 It also comes with a customized Chatbot that helps readers study and a Chinese version translated mainly by AI. This is an open-source project.
15
278
1K
@ChengTim0708
Ta-Ying Cheng
4 months
Imagine a Van Gogh-style teapot turning into glass with one simple slider🎨 Introducing MARBLE, material edits by simply changing CLIP embedding! 🔗 https://t.co/VOHGwUGFVZ 👏 Internship project with @prafull7, @markb_boss , @jampani_varun at @StabilityAI
1
5
25
@zjasper666
Jasper
5 months
It’s been 6 years since I did my summer AI research at @YiMaTweets’s lab. Always had great time hanging out with lab mates. Congrats to @simon_zhai and @HaozhiQ on becoming doctors and joining @GoogleDeepMind and @AIatMeta 💜
18
3
81
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
🚀 Glad to see our All-Angles Bench ( https://t.co/2GeMZmS31b) being adopted to evaluate 3D spatial understanding in Seed-1.5-VL-thinking along with OpenAI (o1) and Gemini 2.5 Pro..!
Tweet card summary image
github.com
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs - Chenyu-Wang567/All-Angles-Bench
@TsingYoga
Yujia Qin
5 months
Introducing Seed-1.5-VL-thinking, the model achieves SOTA on 38 out of 60 VLM benchmarks🥳🥳🥳 https://t.co/MOWaHM8leh
0
8
23
@TsingYoga
Yujia Qin
5 months
Introducing Seed-1.5-VL-thinking, the model achieves SOTA on 38 out of 60 VLM benchmarks🥳🥳🥳 https://t.co/MOWaHM8leh
8
83
462
@YiMaTweets
Yi Ma
5 months
It seems there is still a long way to go for multi-modal large models to truly understand space and scene.
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: https://t.co/yT9aHD3fwm
2
12
52
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[n/n] huge thanks to my amazing collaborators (@Chenyuu_Wang, @TongPetersb, @ChengTim0708, @TianzheC, @simon_zhai, @Yubei_Chen, @hi_ice_boy, @YiMaTweets) 🔗 ArXiv: https://t.co/PYI4kSFklv 💻 Code: https://t.co/fipQeSjxZE 🤗 Hugging Face Benchmark:
Tweet card summary image
huggingface.co
0
0
3
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[7/n] 📷 While GPT-4o and Gemini-2.0-Flash handle single-view scene reconstruction reasonably well, they falter in aligning multi-view perspectives. 🧭 Poor camera pose estimation → flawed directional reasoning → weak multi-view consistency.
1
1
2
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[6/n] 🧠 We test CoT methods on GPT-4o, Ovis2, and InternVL2.5 under full & partial views. 📈 CoT helps GPT-4o in partial-view counting, but shows little gain on strong models like InternVL2.5. ⚠️ Takeaway: Prompting isn’t enough—multi-view reasoning needs specialized training.
1
0
0
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[5/n] 🔍 We analyze MLLM consistency using paired QAs: ✅ CC = both correct, ❌ WW = both wrong, ⚠️ IC = inconsistent. 1️⃣ GPT-4o shows ~70% IC on relative distance—highly unstable! 2️⃣ All models >40% IC on relative direction → struggles w/ orientation.
1
0
0
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[4/n] 🤔 While we evaluate 27 MLLMs, we have two findings: Finding 1️⃣: Simple task for human like coarse camera pose estimation poses challenges for MLLMs. Finding 2️⃣: Certain open-source MLLMs surpass closed-source ones in orientation-sensitive tasks.
1
0
0
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[3/n] 🧠 How we built All-Angles Bench: (1) Curated 90 diverse multi-view scenes & 6 task types (2) Generated questions via MLLMs + refined w/ human annotation (3) Created cross-view question pairs to test consistency & visual grounding
1
0
0
@danielyehhh
Chun-Hsiao (Daniel) Yeh
5 months
[2/n] 🧠All-Angles Bench comprises six challenging tasks— counting, attribute identification, relative distance, relative direction, manipulation, and camera pose estimation. These question types are designed to investigate several major aspects of 3D scene understanding.
1
0
2
@danielyehhh
Chun-Hsiao (Daniel) Yeh
2 years
Surprising that diffusions models already have these capabilities without the need of further training!! Congrats @ChengTim0708
@ChengTim0708
Ta-Ying Cheng
2 years
Today, with my collaborators @prafull7 (MIT CSAIL), @jampani_varun (@StabilityAI ), and my supervisors Niki Trigoni and Andrew Markham, we share with you ZeST, a zero-shot, training free method for image-to-image material transfer! Project Page: https://t.co/0fsl32S07t 1/8
1
0
1
@danielyehhh
Chun-Hsiao (Daniel) Yeh
2 years
Thanks, @_akhaliq for sharing our work! 🙏 Huge props to the @ChengTim0708 @hyhsiehlouis @chuanenlin @HTKung236938 @YiMaTweets @Yubei_Chen for making it all happen🙌 With 🏞️Gen4Gen, you can easily compose your own images into realistic scenes, complete with rich text details!
@_akhaliq
AK
2 years
Gen4Gen Generative Data Pipeline for Generative Multi-Concept Composition Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This
0
6
36
@_akhaliq
AK
2 years
Gen4Gen Generative Data Pipeline for Generative Multi-Concept Composition Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This
6
65
238
@sainingxie
Saining Xie
2 years
Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we
42
532
3K
@danielyehhh
Chun-Hsiao (Daniel) Yeh
2 years
Our groundbreaking work enables personalized search, allowing you to easily find specific moments in videos where your personal instances appear! Our poster is in the morning session tomorrow (tag: THU-AM-252) on Thursday, June 22nd. #CVPR2023 @FabianCabaH
@_akhaliq
AK
2 years
Meta-Personalizing Vision-Language Models to Find Named Instances in Video paper page: https://t.co/whF6qauh7g Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they
0
0
0
@_akhaliq
AK
2 years
Meta-Personalizing Vision-Language Models to Find Named Instances in Video paper page: https://t.co/whF6qauh7g Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they
1
15
96