danadaeun Profile Banner
Daeun Lee Profile
Daeun Lee

@danadaeun

Followers
506
Following
441
Media
24
Statuses
430

PhD student @unccs advised by @mohitban47 | Intern @AdobeResearch | Multimodal, Video, Embodied AI, Post-training

United States
Joined February 2024
Don't wanna be here? Send us removal request.
@danadaeun
Daeun Lee
7 days
πŸ€” We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze πŸ‘€, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video
1
23
41
@Alibaba_Qwen
Qwen
14 hours
πŸš€ We introduce Soft Adaptive Policy Optimization (SAPO) β€” a smooth, stable, and highly effective RL method for training large language models. Why SAPO? πŸ”Ή Hard clipping is brittle β€” gradients vanish or explode πŸ”Ή MoE models amplify variance, making training even more unstable
Tweet card summary image
arxiv.org
Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains...
17
124
947
@heyrimsha
Rimsha Bhardwaj
2 days
Holy shit… Meta might’ve just solved self-improving AI 🀯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of
37
67
448
@zhou_honglu
Honglu Zhou
19 hours
Thanks for sharing our work, AK! 🫰
@_akhaliq
AK
1 day
Active Video Perception Iterative Evidence Seeking for Agentic Long Video Understanding
0
3
5
@danadaeun
Daeun Lee
15 hours
Check out my labmate Ziyang’s impressive long-video reasoning agentic framework! πŸŽ‰ Similar to human perception, this Active Video Perception system integrates planning, observation, and reflection across multiple agents, effectively leveraging temporal evidence for long-video
@ZiyangW00
Ziyang Wang
1 day
🚨 Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🚨 Introducing Active Video Perception: an evidence-seeking framework that treats the video as an interactive environment and acquires compact, query-relevant evidence. 🎬 Key
1
1
6
@LiJunnan0409
Li Junnan
22 hours
Introducing AVP - our new multimodal agent for long video understanding!
@ZiyangW00
Ziyang Wang
1 day
🚨 Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🚨 Introducing Active Video Perception: an evidence-seeking framework that treats the video as an interactive environment and acquires compact, query-relevant evidence. 🎬 Key
5
11
42
@zhou_honglu
Honglu Zhou
23 hours
Super excited about the idea - so simple yet so smart and powerful! 😍 The old passive video-perception setup just doesn't make sense anymore. Grabbing all visual info once, with fixed granularity and no query awareness, is inefficient and overloads the model. So we built Active
@ZiyangW00
Ziyang Wang
1 day
🚨 Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🚨 Introducing Active Video Perception: an evidence-seeking framework that treats the video as an interactive environment and acquires compact, query-relevant evidence. 🎬 Key
2
6
17
@tom_doerr
Tom DΓΆrr
2 days
Video segmentation model runs on phones https://t.co/MFDM1foYlM
2
66
587
@jmin__cho
Jaemin Cho
3 days
Soon - I will chat about multimodal AI with @nikparth1 @sainingxie @RanjayKrishna at DCVLR workshop! Location: Upper Level Ballroom 6DE
1
5
39
@akshay_pachaar
Akshay πŸš€
3 days
You're in a Research Scientist interview at Google. Interviewer: We have a base LLM that's terrible at maths. How would you turn it into a maths & reasoning powerhouse? You: I'll get some problems labeled and fine-tune the model. Interview over. Here's what you missed:
38
70
1K
@dggoldst
Dan Goldstein
4 days
Want to be an intern at Microsoft Research in the Computational Social Science group in NYC (Jake Hofman, David Rothschild, Dan Goldstein) Follow this link and do your thing! Deadline approaching soonish! https://t.co/wKFBQmhLzt
Tweet card summary image
apply.careers.microsoft.com
Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life....
7
28
138
@shashank_bits
Shashank Gupta ✈️ NeurIPS'25
4 days
Thanks so much for the overwhelming interest πŸ™‡πŸ»β€β™‚οΈ Apologies if I can’t respond to everyone right away β€” please keep the messages coming and apply on the portal! I’ll do my best to reply over the next few days. πŸš€
@shashank_bits
Shashank Gupta ✈️ NeurIPS'25
7 days
πŸ“’πŸ’ͺπŸš€ We’re hiring amazing interns to join a focused, driven team pushing the frontier of agentic LLMs at Ai2 β€” training, evaluation, tool-use, memory, safety, theory, and more. #NeurIPS2025 Apply here or message me:
1
1
13
@parksimon0808
Simon Park
4 days
How does RL improve OOD reasoning? How can we distinguish compositional generalization from length generalization? What makes a composition more learnable? Check out our #neurips2025 workshop poster tomorrow! πŸ—“οΈSat, 12/6, 8am-5pm Efficient Reasoning πŸ“Exhibit Hall F (Spotlight)
Tweet card summary image
arxiv.org
While reinforcement learning (RL) successfully enhances reasoning in large language models, its role in fostering compositional generalization (the ability to synthesize novel skills from known...
0
25
157
@ZiyangW00
Ziyang Wang
6 days
Super interesting work! I like the idea of leveraging human gaze as a temporal prior for long-horizon egocentric reasoning. Excited to see benchmarks driving the next wave of grounded, real-time video reasoning! Check more hereπŸ‘‡
@danadaeun
Daeun Lee
7 days
πŸ€” We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze πŸ‘€, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video
1
3
7
@shoubin621
Shoubin Yu
6 days
🚨Check Daeun's new cool work with Adobe Research: StreamGaze, the first comprehensive benchmark for gaze-guided streaming video understanding. It’s the first to test not only past/present comprehension but proactive reasoning based on real-time human gaze in the streaming
@danadaeun
Daeun Lee
7 days
πŸ€” We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze πŸ‘€, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video
2
3
9
@david_s_yoon
David Seunghyun Yoon
7 days
Excited to share our work on understanding streaming video. Check our paper and dataset!
@danadaeun
Daeun Lee
7 days
πŸ€” We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze πŸ‘€, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video
0
1
1
@jmin__cho
Jaemin Cho
7 days
A neat VLM benchmark for gaze-guided streaming video understanding! e.g., predicting user intents in real time with AR glasses πŸ‘“
@danadaeun
Daeun Lee
7 days
πŸ€” We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze πŸ‘€, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video
0
5
14
@ruilong_li
Ruilong Li
7 days
If you are interested in research at NVIDIA also welcome to DM for chat! More about our team:
research.nvidia.com
Advancing foundational technologies enabling AI systems to perceive, model, and interact with the physical world.
@ruilong_li
Ruilong Li
8 days
Flighting ✈️ to Neurips now. Can’t wait to catchup with old friends and meet some new ones!
8
15
164
@danadaeun
Daeun Lee
7 days
Huge thanks to an amazing collaboration with Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, @david_s_yoon, Trung Bui, @f_dernoncourt, and @mohitban47 ❀️ (@AdobeResearch @unccs @unc_ai_group) - πŸ“„ Paper: https://t.co/f8ZJNUEejo 🌐 Project Page:
Tweet card summary image
arxiv.org
Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications like AR glasses. While prior...
0
0
4
@danadaeun
Daeun Lee
7 days
πŸ“‰ Additional Analysis 2: How do MLLMs use gaze signals during inference? We ablate the contributions of text-, gaze-, and visual-based reasoning in GPT-4o. Combining gaze and visual reasoning provides the best overall performance, though visual cues improve tasks like Scene
1
0
3
@danadaeun
Daeun Lee
7 days
πŸ“‰ Additional Analysis 1: Ablation of Gaze Input Prompting We evaluate multiple strategies for injecting gaze signals into Qwen2.5-VL (7B). While salience maps outperform other prompting methods, a more adaptive and task-aware mechanism is needed to fully capture the diverse
1
0
4