will_wang_whc Profile Banner
Will (Huichen) Wang Profile
Will (Huichen) Wang

@will_wang_whc

Followers
51
Following
30
Media
8
Statuses
19

CS PhD student @uwcse @uwdata

Joined September 2021
Don't wanna be here? Send us removal request.
@will_wang_whc
Will (Huichen) Wang
2 months
🎉 Excited to share that our #ICML2025 paper, EMMA, is selected for an oral presentation (top 1%)!. 🗣️ Catch our talk:.Wednesday, July 16 .10:15 AM @ West Exhibition Hall C.📌 Poster:.Wednesday, July 16.11:00 AM–1:30 PM @ Poster Session 3 East.Come say hi!
Tweet media one
0
4
15
@will_wang_whc
Will (Huichen) Wang
3 months
RT @tableau: Tableau Research @vsetlur presents Jupybara—a multi-agent AI assistant that helps analysts turn data into clear, persuasive, a….
0
3
0
@will_wang_whc
Will (Huichen) Wang
4 months
Woohoo🥳 Check out our paper!.
@Kuvvius
Jiawei Gu
4 months
(1/4) 🚨 Thrilled to announce that our paper on EMMA has been accepted as a Spotlight at #ICML2025 (Top 2.6%)! 🎉. Not just image understanding + text reasoning. 👀 EMMA’s about vision-driven reasoning to tackle what vision or language alone fails at. 🔗
Tweet media one
0
0
1
@will_wang_whc
Will (Huichen) Wang
8 months
(7/8) We also provide labels for each question based on the multimodal skills it assesses. Using these labels, we find that CoT prompting hurts performance on visual-reasoning-heavy tasks, while it benefits closed-source models on tasks where textual CoT is theoretically useful.
Tweet media one
2
0
0
@will_wang_whc
Will (Huichen) Wang
8 months
(6/8) A manual error analysis on o1 mistakes shows 53% of errors are visual reasoning errors. For the question below, o1 recognizes that it needs to use the right-hand rule, but it fails to determine the correct thumb direction when fingers are curled.
Tweet media one
1
0
0
@will_wang_whc
Will (Huichen) Wang
8 months
(5/8) We also try various test-time compute scaling strategies. While they tend to boost model performance, they are far from enough to close the gap to human-level performance. The best model and scaling strategy configuration we try still trails humans by 27%.
Tweet media one
1
1
0
@will_wang_whc
Will (Huichen) Wang
8 months
(4/8) We test 9 SOTA MLLMs on EMMA, and all fall significantly short of human performance. On a subject-balanced subset, the best model, o1, scores 45.75%—a staggering 32% below human experts!
Tweet media one
2
0
0
@will_wang_whc
Will (Huichen) Wang
8 months
(3/8) To ensure EMMA questions truly require reasoning in multimodality, we adopt an enhanced filtering pipeline on existing benchmarks, removing questions solvable by MLLMs with text and image captions. In addition, we contribute 1,796 new questions that we manually curate.
Tweet media one
1
0
0
@will_wang_whc
Will (Huichen) Wang
8 months
(2/8) Visual information in current multimodal benchmarks is often redundant with text, allowing models to shortcut through textual reasoning. EMMA features 2,788 questions across math, physics, chemistry, and coding that require integrated textual and visual reasoning.
Tweet media one
2
0
0
@will_wang_whc
Will (Huichen) Wang
8 months
(1/8) Can your MLLM actually reason over text and images?. ✨Introducing EMMA: An Enhanced MultiModal ReAsoning Benchmark—a benchmark even multimodal o1 fails to crack!
Tweet media one
2
5
6
@will_wang_whc
Will (Huichen) Wang
10 months
RT @uwdata: With DracoGPT, Will Wang shows how to extract and model visualization design preferences from generative AI systems — enabling….
Tweet card summary image
idl.uw.edu
0
3
0
@will_wang_whc
Will (Huichen) Wang
11 months
RT @chasejstokes: I am also currently applying for and seeking jobs based in the Chicago area 🏙️ (both industry and academia). If you are s….
0
2
0
@will_wang_whc
Will (Huichen) Wang
11 months
An awesome place to do research on data visualization and analytics!!.
@vsetlur
Vidya Setlur
11 months
Tableau Research is now considering Summer 2025 interns!.The entire team will be at VIS 2024. If you are excited about high-impact research that helps people see and understand data, do apply! #tableau.
0
0
2
@will_wang_whc
Will (Huichen) Wang
1 year
New user study🚨.@vsetlur and I built a data analysis and storytelling assistant. Sign up for a 1 hour user study if you have at least 2 years of experience analyzing data in Jupyter Notebook and writing data stories. Participants receive a $30 gift card.
Tweet card summary image
docs.google.com
We are from Tableau Research. We are conducting a 1-hour user study on a prototype we built for exploratory data analysis and storytelling. If you have at least 2 years of experience in analyzing...
0
5
11
@will_wang_whc
Will (Huichen) Wang
1 year
Ever done data analysis in Jupyter Notebook and created reports/presentations? If so, we'd love to talk to you! @vsetlur and I are calling for participants to join our 45-minute virtual interview. A 20-dollar Amazon gift card awaits! Please sign up here:
Tweet card summary image
docs.google.com
We are from Tableau Research, and are conducting an interview study to understand how people perform exploratory data analysis and craft data narratives to convey actionable insights. If you have...
0
2
4