Will (Huichen) Wang @will_wang_whc X Profile

Will (Huichen) Wang

@will_wang_whc

Followers

51

Following

30

Media

8

Statuses

19

CS PhD student @uwcse @uwdata

Joined September 2021

Don't wanna be here? Send us removal request.

Will (Huichen) Wang

@will_wang_whc

2 months

🎉 Excited to share that our #ICML2025 paper, EMMA, is selected for an oral presentation (top 1%)!. 🗣️ Catch our talk:.Wednesday, July 16 .10:15 AM @ West Exhibition Hall C.📌 Poster:.Wednesday, July 16.11:00 AM–1:30 PM @ Poster Session 3 East.Come say hi!

0

4

15

Will (Huichen) Wang

@will_wang_whc

3 months

RT @tableau: Tableau Research @vsetlur presents Jupybara—a multi-agent AI assistant that helps analysts turn data into clear, persuasive, a….

0

3

0

Will (Huichen) Wang

@will_wang_whc

4 months

Woohoo🥳 Check out our paper!.

Jiawei Gu

@Kuvvius

4 months

(1/4) 🚨 Thrilled to announce that our paper on EMMA has been accepted as a Spotlight at #ICML2025 (Top 2.6%)! 🎉. Not just image understanding + text reasoning. 👀 EMMA’s about vision-driven reasoning to tackle what vision or language alone fails at. 🔗

0

1

Will (Huichen) Wang

@will_wang_whc

8 months

(8/8) Read more in our paper: We’ve also open-sourced the dataset and maintain a leaderboard at �.

arxiv.org

The ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such multimodal...

1

0

Will (Huichen) Wang

@will_wang_whc

8 months

(7/8) We also provide labels for each question based on the multimodal skills it assesses. Using these labels, we find that CoT prompting hurts performance on visual-reasoning-heavy tasks, while it benefits closed-source models on tasks where textual CoT is theoretically useful.

2

0

Will (Huichen) Wang

@will_wang_whc

8 months

(6/8) A manual error analysis on o1 mistakes shows 53% of errors are visual reasoning errors. For the question below, o1 recognizes that it needs to use the right-hand rule, but it fails to determine the correct thumb direction when fingers are curled.

1

0

Will (Huichen) Wang

@will_wang_whc

8 months

(5/8) We also try various test-time compute scaling strategies. While they tend to boost model performance, they are far from enough to close the gap to human-level performance. The best model and scaling strategy configuration we try still trails humans by 27%.

1

0

Will (Huichen) Wang

@will_wang_whc

8 months

(4/8) We test 9 SOTA MLLMs on EMMA, and all fall significantly short of human performance. On a subject-balanced subset, the best model, o1, scores 45.75%—a staggering 32% below human experts!

2

0

Will (Huichen) Wang

@will_wang_whc

8 months

(3/8) To ensure EMMA questions truly require reasoning in multimodality, we adopt an enhanced filtering pipeline on existing benchmarks, removing questions solvable by MLLMs with text and image captions. In addition, we contribute 1,796 new questions that we manually curate.

1

0

Will (Huichen) Wang

@will_wang_whc

8 months

(2/8) Visual information in current multimodal benchmarks is often redundant with text, allowing models to shortcut through textual reasoning. EMMA features 2,788 questions across math, physics, chemistry, and coding that require integrated textual and visual reasoning.

2

0

Will (Huichen) Wang

@will_wang_whc

8 months

(1/8) Can your MLLM actually reason over text and images?. ✨Introducing EMMA: An Enhanced MultiModal ReAsoning Benchmark—a benchmark even multimodal o1 fails to crack!

2

5

6

Will (Huichen) Wang

@will_wang_whc

10 months

RT @uwdata: With DracoGPT, Will Wang shows how to extract and model visualization design preferences from generative AI systems — enabling….

idl.uw.edu

0

3

0

Will (Huichen) Wang

@will_wang_whc

11 months

RT @chasejstokes: I am also currently applying for and seeking jobs based in the Chicago area 🏙️ (both industry and academia). If you are s….

0

2

0

Will (Huichen) Wang

@will_wang_whc

11 months

An awesome place to do research on data visualization and analytics!!.

Vidya Setlur

@vsetlur

11 months

Tableau Research is now considering Summer 2025 interns!.The entire team will be at VIS 2024. If you are excited about high-impact research that helps people see and understand data, do apply! #tableau.

0

2

Will (Huichen) Wang

@will_wang_whc

1 year

New user study🚨.@vsetlur and I built a data analysis and storytelling assistant. Sign up for a 1 hour user study if you have at least 2 years of experience analyzing data in Jupyter Notebook and writing data stories. Participants receive a $30 gift card.

docs.google.com

We are from Tableau Research. We are conducting a 1-hour user study on a prototype we built for exploratory data analysis and storytelling. If you have at least 2 years of experience in analyzing...

0

5

11

Will (Huichen) Wang

@will_wang_whc

1 year

Ever done data analysis in Jupyter Notebook and created reports/presentations? If so, we'd love to talk to you! @vsetlur and I are calling for participants to join our 45-minute virtual interview. A 20-dollar Amazon gift card awaits! Please sign up here:

docs.google.com

We are from Tableau Research, and are conducting an interview study to understand how people perform exploratory data analysis and craft data narratives to convey actionable insights. If you have...

0

2

4