Yue Yang
@YueYangAI
Followers
590
Following
159
Media
31
Statuses
109
Research scientist @allen_ai | PhD @upennnlp | Vision and Language
Joined July 2018
How well can LLMs & deep research systems synthesize long-form answers to *thousands of research queries across diverse domains*? Excited to announce 🎓📖 ResearchQA: a large-scale benchmark to evaluate long-form scholarly question answering at scale across 75 fields, using
1
24
61
🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵
15
81
341
🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd! • 📉 Conformity ↑ when agents lack confidence but trust peers • 🧠 Presentation format shapes peer influence • 🎯 Controlled herding can boost collaboration outcomes 👉 Read more: https://t.co/Ym0rtKyVzH
0
8
13
Successfully defended my PhD thesis and got hooded this week! Thanks to all the friends who supported me throughout this incredible journey! Excited to join PRIOR at @allen_ai next and continue exploring open vision-language research!
16
4
155
🎉CoSyn is accepted by ACL2025!
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: https://t.co/9IQ4CgeKMF Dataset: https://t.co/yiERrZup8X Paper:
0
0
7
#NAACL2025 How to compare cultural differences with social media data in scale? Our work uses lexica to annotate X 🇺🇸 & Weibo 🇨🇳 posts with valence (😄☹️) & arousal (🔥❄️) scores, revealing cross-cultural differences in emotional expression. https://t.co/2tNFceO9GD
aclanthology.org
Young Min Cho, Dandan Pang, Stuti Thapa, Garrick Sherman, Lyle Ungar, Louis Tay, Sharath Chandra Guntuku. Findings of the Association for Computational Linguistics: NAACL 2025. 2025.
0
4
13
#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD
2
40
259
Exciting news! 🎉 Our paper “ViUniT: Visual Unit Tests for More Robust Visual Programming” got accepted at #CVPR2025
🎉Just Announced: "ViUniT: Visual Unit Tests for More Robust Visual Programming" has been accepted at #CVPR2025! Paper Link: https://t.co/nbLc1yq991 Project Page: https://t.co/rH9Z9uMMKC Researcher’s walk-through 👇 In collaboration with @UPenn, we introduce ViUniT, a framework
0
2
17
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of
1
39
87
This work is done during my great summer internship at @allen_ai with my awesome collaborators: Ajay Patel, @mattdeitke, @tanmay2099, @LucaWeihs, @drewmikehead, @yatskar, Chris Callison-Burch, @RanjayKrishna, @anikembhavi, Christopher Clark.
0
0
4
We also show we can create synthetic pointing data to improve the click accuracy of VLMs in GUI agent tasks. On the ScreenSpot click prediction benchmark, our model trained on synthetic pointing data can outperform existing methods with much less training data.
1
0
5
We notice open VLMs struggle with novel out-of-domain tasks like interpreting nutrition labels. However, CoSyn’s controllable data generation can create targeted synthetic data for task-specific fine-tuning, achieving strong zero-shot performance with significantly less data.
1
0
4
On 7 text-rich benchmarks (e.g., ChartQA, DocVQA), our model trained on synthetic data outperforms competitive open and proprietary VLMs. Our zero-shot model, trained without benchmark examples, beats most baselines, proving the generalizability of training on synthetic data.
2
0
4
CoSyn uses code as the intermediate representation to build synthetic multimodal datasets. We prompt a text-only LLM to generate code that renders images, and then we use code as context to create instruction-tuning data, such as QA pairs, for fine-tuning VLMs.
1
0
5
Our CoSyn framework integrates 11 rendering tools for 20 robust generation pipelines, which support the creation of diverse text-rich images, including charts, documents, diagrams, tables, and even music sheets 🎼, and many more!
1
0
5
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: https://t.co/9IQ4CgeKMF Dataset: https://t.co/yiERrZup8X Paper:
6
48
196
Articulate Anything has just been accepted to @iclr_conf #ICLR2025 ! Looking forward to seeing everyone in Singapore 🇸🇬 🙀❤️!
📦 Can frontier AI transform ANY physical object from ANY input modality into a high-quality digital twin that also MOVES? Excited to share our work,Articulate-Anything, exploring how large vision-language models (VLMs) can bridge the gap between the physical and digital
3
8
44
📢Applications are open for summer'25 internships at the PRIOR (computer vision) team @allen_ai: Come join us in building large-scale models for: 📸 Open-source Vision-Language Models 💻 Multimodal Web Agents 🤖 Embodied AI + Robotics 🌎 Planet Monitoring Apply by December
1
13
47
Excited to share ✨ Contextualized Evaluations ✨! Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓
2
31
122
🤔What model explanation method should you use? How to ensure it reflects the model’s true reasoning? 🌟 In our CL survey, Towards Faithful Model Explanation in NLP, we review 110+ explainability methods through the lens of faithfulness. Check out my presentation at #EMNLP2024!
1
8
33