Yeda Song @__runamu__ X Profile

Yeda Song

@__runamu__

Followers

182

Following

309

Media

3

Statuses

20

Multimodal Agents for the Real World: GUI Agents, VLM, and RL @ UMich 🇺🇸

https://t.co/dtLQWkEsrc

Ann Arbor, Michigan, USA

Joined January 2022

Don't wanna be here? Send us removal request.

Yeda Song

@__runamu__

5 months

🔥 GUI agents struggle with real-world mobile tasks. We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data. ✅ VLMs trained on MONDAY show strong generalization ✅ Open data (313K steps) (1/7) 🧵 #CVPR

2

15

48

Aviral Kumar

@aviral_kumar2

2 months

🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️

11

83

707

Seohong Park

@seohong_park

4 months

Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!

Seohong Park

@seohong_park

9 months

Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: https://t.co/kjaeqHcBFh Project page: https://t.co/D8vFcZib1F Thread ↓

5

68

505

Yeda Song

@__runamu__

4 months

✨Two life updates✨ 1. Started my internship at @LG_AI_Research in Ann Arbor, Michigan — Advancing AI for a better life! 🔮 2. Advanced to PhD candidacy at UMich CSE. This means I’ve completed my coursework and passed the qualification process. 🙌

3

1

144

Andrej Karpathy

@karpathy

4 months

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal

Omar Sanseviero

@osanseviero

4 months

I’m so excited to announce Gemma 3n is here! 🎉 🔊Multimodal (text/audio/image/video) understanding 🤯Runs with as little as 2GB of RAM 🏆First model under 10B with @lmarena_ai score of 1300+ Available now on @huggingface, @kaggle, llama.cpp, https://t.co/CNDy479EEv, and more

397

1K

11K

Sangwoo Mo

@sangwoomo

5 months

Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 https://t.co/pH1Qjc1Kr2

2

17

93

MichiganAI

@michigan_AI

5 months

We're heading to #CVPR2025! 📰Curious about what’s coming? Take a look at our list of accepted papers and come to meet the authors! Get ready for innovative #AI research and fresh insights!

0

4

8

Furong Huang

@furongh

5 months

Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics

2

17

65

Jianwei Yang

@jw2yang4ai

6 months

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.

1

27

103

Yeda Song

@__runamu__

5 months

Arrived in Nashville for #CVPR 🤠 Excited to present MONDAY, a collaboration with @LG_AI_Research! 📍 MMFM Workshop - Thu, 9:40 AM 📍 Main Conference - Fri, 4:00 PM Let’s connect and chat!🤝 Also exploring Summer 2026 internships 🔍 MONDAY website:

0

1

12

Yeda Song

@__runamu__

5 months

MONDAY is right here for you: Open dataset & usage code 👉 https://t.co/rwJeaAz2t5 Big thanks to our amazing collaborators, @YunseokJANG, @sungryulls, @lajanugen, @tiangeluo, Dong-Ki Kim, Kyunghoon Bae, and @honglaklee. 🎸 Catch our poster presentations at #CVPR2025! (7/7)

0

2

Yeda Song

@__runamu__

5 months

And it works: 📈 Vision-language models trained on MONDAY show an average +18% gain on an unseen mobile OS, along with consistent boosts on AitW, AMEX, and our own test set. We evaluated this using SeeClick (9.6B) and Llama-3.2-11B-Vision-Instruct as baseline models. (6/7)

1

3

Yeda Song

@__runamu__

5 months

We achieved this with our robust, fully automated pipeline: 🔹 OCR-based scene detection (95% F1), outperforming vision-based approaches 🔹 Near-perfect UI element detection (99.9% hit rate) 🔹 Novel 3-step action identification using VLMs for precise, context-aware labels (5/7)

1

0

1

Yeda Song

@__runamu__

5 months

MONDAY solves this by turning internet videos into useful data: 📱Real-world and diverse 🔁 Easy to expand with new videos 💸17× cheaper than manual annotation ($0.34 vs $5.76/video) No manual annotation. No system access needed. Just authentic human interactions at scale. (4/7)

1

0

1

Yeda Song

@__runamu__

5 months

GUI agents fail in the wild because existing training datasets ❌ lack diversity across mobile OS platforms, apps, & user configs ❌ get quickly outdated ❌ are too costly to scale (3/7)

1

0

1

Yeda Song

@__runamu__

5 months

"Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents" Project: https://t.co/rwJeaAz2t5 Code: https://t.co/qxwrM15AMX Data: https://t.co/mjTvnStiIG Paper: https://t.co/fA4IofYTgX (2/7) #GUIAgent #CUA #CVPR #CVPR2025

huggingface.co

1

0

2

Shunyu Yao

@ShunyuYao12

7 months

I finally wrote another blogpost: https://t.co/WddJkbSfks AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.

38

210

1K

Kenneth Li

@ke_li_2021

1 year

LLM chatbots are moving fast, but how do we make them better? In my new blog at The Gradient, I argue that an important next step is giving them a sense of "purpose."

1

8

26

Rada Mihalcea

@radamihalcea

1 year

I love our Michigan AI Lab @michigan_AI! A group of people who not only does some of the coolest research in AI, but also care for and of each other, and enjoy each other’s company. A picture from this week’s fun picnic. ❤️

1

6

124

Sangdoo Yun

@oodgnas

2 years

Glad to share our work at #ACL2023, "MPChat: Towards Multimodal Persona-Grounded Conversation" https://t.co/S8US4LaYr5 ! #multimodal #persona_chat authors: @AHNJAEWOO2, @__runamu__, Gunhee Kim

arxiv.org

In order to build self-consistent personalized dialogue agents, previous research has mostly focused on textual persona that delivers personal facts or personalities. However, to fully describe...

0

4

8