Yeda Song Profile
Yeda Song

@__runamu__

Followers
182
Following
309
Media
3
Statuses
20

Multimodal Agents for the Real World: GUI Agents, VLM, and RL @ UMich 🇺🇸

Ann Arbor, Michigan, USA
Joined January 2022
Don't wanna be here? Send us removal request.
@__runamu__
Yeda Song
5 months
🔥 GUI agents struggle with real-world mobile tasks. We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data. ✅ VLMs trained on MONDAY show strong generalization ✅ Open data (313K steps) (1/7) 🧵 #CVPR
2
15
48
@aviral_kumar2
Aviral Kumar
2 months
🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️
11
83
707
@seohong_park
Seohong Park
4 months
Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!
@seohong_park
Seohong Park
9 months
Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: https://t.co/kjaeqHcBFh Project page: https://t.co/D8vFcZib1F Thread ↓
5
68
505
@__runamu__
Yeda Song
4 months
✨Two life updates✨ 1. Started my internship at @LG_AI_Research in Ann Arbor, Michigan — Advancing AI for a better life! 🔮 2. Advanced to PhD candidacy at UMich CSE. This means I’ve completed my coursework and passed the qualification process. 🙌
3
1
144
@karpathy
Andrej Karpathy
4 months
The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal
@osanseviero
Omar Sanseviero
4 months
I’m so excited to announce Gemma 3n is here! 🎉 🔊Multimodal (text/audio/image/video) understanding 🤯Runs with as little as 2GB of RAM 🏆First model under 10B with @lmarena_ai score of 1300+ Available now on @huggingface, @kaggle, llama.cpp, https://t.co/CNDy479EEv, and more
397
1K
11K
@sangwoomo
Sangwoo Mo
5 months
Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 https://t.co/pH1Qjc1Kr2
2
17
93
@michigan_AI
MichiganAI
5 months
We're heading to #CVPR2025! 📰Curious about what’s coming? Take a look at our list of accepted papers and come to meet the authors! Get ready for innovative #AI research and fresh insights!
0
4
8
@furongh
Furong Huang
5 months
Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics
2
17
65
@jw2yang4ai
Jianwei Yang
6 months
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 https://t.co/Z5r48oh6iv ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.
1
27
103
@__runamu__
Yeda Song
5 months
Arrived in Nashville for #CVPR 🤠 Excited to present MONDAY, a collaboration with @LG_AI_Research! 📍 MMFM Workshop - Thu, 9:40 AM 📍 Main Conference - Fri, 4:00 PM Let’s connect and chat!🤝 Also exploring Summer 2026 internships 🔍 MONDAY website:
0
1
12
@__runamu__
Yeda Song
5 months
MONDAY is right here for you: Open dataset & usage code 👉 https://t.co/rwJeaAz2t5 Big thanks to our amazing collaborators, @YunseokJANG, @sungryulls, @lajanugen, @tiangeluo, Dong-Ki Kim, Kyunghoon Bae, and @honglaklee. 🎸 Catch our poster presentations at #CVPR2025! (7/7)
0
0
2
@__runamu__
Yeda Song
5 months
And it works: 📈 Vision-language models trained on MONDAY show an average +18% gain on an unseen mobile OS, along with consistent boosts on AitW, AMEX, and our own test set. We evaluated this using SeeClick (9.6B) and Llama-3.2-11B-Vision-Instruct as baseline models. (6/7)
1
1
3
@__runamu__
Yeda Song
5 months
We achieved this with our robust, fully automated pipeline: 🔹 OCR-based scene detection (95% F1), outperforming vision-based approaches 🔹 Near-perfect UI element detection (99.9% hit rate) 🔹 Novel 3-step action identification using VLMs for precise, context-aware labels (5/7)
1
0
1
@__runamu__
Yeda Song
5 months
MONDAY solves this by turning internet videos into useful data: 📱Real-world and diverse 🔁 Easy to expand with new videos 💸17× cheaper than manual annotation ($0.34 vs $5.76/video) No manual annotation. No system access needed. Just authentic human interactions at scale. (4/7)
1
0
1
@__runamu__
Yeda Song
5 months
GUI agents fail in the wild because existing training datasets ❌ lack diversity across mobile OS platforms, apps, & user configs ❌ get quickly outdated ❌ are too costly to scale (3/7)
1
0
1
@__runamu__
Yeda Song
5 months
"Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents" Project: https://t.co/rwJeaAz2t5 Code: https://t.co/qxwrM15AMX Data: https://t.co/mjTvnStiIG Paper: https://t.co/fA4IofYTgX (2/7) #GUIAgent #CUA #CVPR #CVPR2025
Tweet card summary image
huggingface.co
1
0
2
@ShunyuYao12
Shunyu Yao
7 months
I finally wrote another blogpost: https://t.co/WddJkbSfks AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.
38
210
1K
@ke_li_2021
Kenneth Li
1 year
LLM chatbots are moving fast, but how do we make them better? In my new blog at The Gradient, I argue that an important next step is giving them a sense of "purpose."
1
8
26
@radamihalcea
Rada Mihalcea
1 year
I love our Michigan AI Lab @michigan_AI! A group of people who not only does some of the coolest research in AI, but also care for and of each other, and enjoy each other’s company. A picture from this week’s fun picnic. ❤️
1
6
124