Jaehong Yoon
@jaeh0ng_yoon
Followers
2K
Following
1K
Media
38
Statuses
729
Assistant Professor @ntusg CCDS | Prv: @uncnlp @MLAI_KAIST @MSFTResearch | Trustworthy & Continually Adaptable Multimodal AI
Singapore
Joined April 2016
🎉 Excited to share that 5/5 of my papers (3 main, 2 findings) have been accepted at #EMNLP2025, in video/multimodal reasoning, instructional video editing, and efficient LLM adaptation & reasoning! 🚨 I’m recruiting Ph.D. students to join the Multimodal AI Group at NTU College
15
31
308
Happy to introduce my internship work at @Google and @GoogleDeepMind, collab w/ @googlecloud. We introduce TIR-Judge, an end-to-end agentic RL framework that trains LLM judges with tool-integrated reasoning 🧠🛠️ 🔗 https://t.co/rtfqlvuzJ0
#Agents #LLMs #Judges #RL #reasoning
11
67
508
Scaling Agent Learning via Experience Synthesis 📝: https://t.co/3WXayMsHrD Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
7
60
325
@cyjustinchen @ArchikiPrasad @swarnaNLP @EliasEskin -- Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning @ZiyangW00 @jaeh0ng_yoon @shoubin621 @mmiemon @gberta227
https://t.co/THxKAhgCPX
https://t.co/c6s8hnrKFH
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
3
8
@cyjustinchen @ArchikiPrasad @swarnaNLP @EliasEskin @ZiyangW00 @jaeh0ng_yoon @shoubin621 @mmiemon @gberta227 -- RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives @jaeh0ng_yoon @shoubin621
https://t.co/Wx3eIJKDfp
https://t.co/2XV9Yhjx9l
arxiv.org
Recent video generative models primarily rely on carefully written text prompts for specific tasks, like inpainting or style editing. They require labor-intensive textual descriptions for input...
🚨New paper👉RACCooN: remove/add/change video content effortlessly/interactively via our MLLM+Video Diffusion (V2P2V) framework with auto-generated descriptions! ▶️ 1. Video-to-Paragraph (V2P): RACCooN first generates well-structured/detailed descriptions of videos with MLLM
1
1
5
🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable
4
39
96
🎉Thanks for the shoutout! I’ll be virtually presenting our new work Video-RTS at #EMNLP2025 (my co-lead @jaeh0ng_yoon will present in person). If you’re into advanced video-reasoning frameworks, check it out: - No SFT, pure RL: trains with simple output-based rewards (GRPO)—no
🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished
0
6
14
Excited to be at #EMNLP2025 in Suzhou! I’ll present our work: (1) MEXA (Fri 12:30 PM CST) about general multimodal reasoning with dynamic multi-expert aggregation and (2) RACCooN (Wed 4:30 PM CST) about editing videos via auto-generated narratives. Please stop by our
🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished
1
9
25
🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished
2
29
63
Details available on my website: https://t.co/3y0FFlP4E4 ▶️ Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning (Main) 🔗: https://t.co/9g45H9qBX7 ▶️ RACCooN: Versatile Instructional Video Editing with Auto-Generated
sites.google.com
Motivation
0
0
3
It was an honor and pleasure to give a keynote at the 28th European Conference on Artificial Intelligence (#ECAI2025) in beautiful Bologna, and engage in enthusiastic discussions about trustworthy + calibrated agents, collaborative reasoning + privacy, and controllable multimodal
1
26
68
🥳🥳 Honored and grateful to be awarded a 2025 Google PhD Fellowship in Machine Learning and ML Foundations for my research on machine unlearning, defenses against adversarial attacks, and multi-agent privacy! ✨ Deep gratitude to my advisor @mohitban47 for his constant
🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: https://t.co/0Pvuv6hsgP
30
18
138
OmniVinci Enhancing Architecture and Data for Omni-Modal Understanding LLM
4
16
108
📢 #ICCV2025 Welcome to check out our work @ICCVConference📢 * 3D/4D - DPoser-X: https://t.co/v0VFv8ZDj3 - Free4D: https://t.co/EKsPhiOLg7 * Video - DCM: https://t.co/zOxqu1vpyt - FreeScale / FreeMorph * VLM - Video-TT: https://t.co/LvyNtbKGUK - MM-SAE: https://t.co/Z6FexYSxOF
1
7
90
I’m really sad to hear this news... but it’s also a great chance to connect with @yilin_sung, who has brilliant ideas and solid experience in efficient AI, reinforcement learning, and (multimodal) LLMs! Anyone working in this area should definitely chat with him. 💡
Tough week! I also got impacted less than 3 months after joining. Ironically, I just landed some new RL infra features the day before. Life moves on. My past work spans RL, PEFT, Quantization, and Multimodal LLMs. If your team is working on these areas, I’d love to connect.
0
2
9
🥳 Honored and grateful to be awarded an NDSEG Fellowship in Computer Science! 💫🇺🇸 Big thanks to my advisor @mohitban47 for his guidance, and shoutout to my lab mates at @unc_ai_group, collaborators, internship advisors, and mentors for their support 🤗 Excited to continue
🎉 Congratulations to our student Zaid Khan (advised by @mohitban47) for being awarded a prestigious NDSEG Fellowship for his work on environment generation! Established in 1989, the fellowship has an acceptance rate of <7% and covers diverse science and engineering disciplines.
15
20
49
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
56
333
2K
🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning! 🔬SciVideoBench: 1. Spans Physics, Chemistry, Biology & Medicine with authentic experimental videos. 2. Features 1,000 challenging MCQs across three reasoning types:
3
29
39