Yan-Bo Lin @yblin98 X Profile

Yan-Bo Lin

@yblin98

Followers

126

Following

1K

Media

25

Statuses

107

Ph.D. student @unccs working on multimodal learning including audio📣, videos🎬, and language💬.

Chapel Hill, NC

Joined August 2022

Don't wanna be here? Send us removal request.

Yan-Bo Lin

@yblin98

1 year

🔊Check out our preprint, AVSiam 🤝. We use a single shared ViT to process audio and visual inputs, improving its parameter efficiency⚡, reducing the GPU memory footprint💻, and allowing us to scale📈 to larger datasets and model sizes. w. @gberta227 .👇.

arxiv.org

Traditional audio-visual methods rely on independent audio and visual backbones, which is costly and not scalable. In this work, we investigate using an audio-visual siamese network (AVSiam) for...

arXiv Sound

@ArxivSound

1 year

``Siamese Vision Transformers are Scalable Audio-visual Learners,'' Yan-Bo Lin, Gedas Bertasius,

2

5

23

Yan-Bo Lin

@yblin98

5 days

RT @hanlin_hl: Thanks AK for sharing our work! . For readers interested in our work, please check our project page: .

0

11

0

Yan-Bo Lin

@yblin98

2 months

RT @shoubin621: New paper Alert 🚨 Introducing MEXA: A general and training-free multimodal reasoning framework via dynamic multi-expert ski….

0

29

0

Yan-Bo Lin

@yblin98

2 months

RT @gberta227: I will be presenting more details on SiLVR at the LOVE: Multimodal Video Agent workshop at 12:15pm CST in Room 105A!.

0

3

0

Yan-Bo Lin

@yblin98

2 months

RT @CMHungSteven: @CVPR is around the corner!!.Join us at the Workshop on T4V at #CVPR2025 with a great speaker lineup (@MikeShou1, @jw2yan….

0

19

0

Yan-Bo Lin

@yblin98

2 months

RT @Han_Yi_724: 🚀 Introducing ExAct: A Video-Language Benchmark for Expert Action Analysis.🎥 3,521 expert-curated video QA pairs in 6 domai….

0

5

0

Yan-Bo Lin

@yblin98

2 months

RT @gberta227: Check out our recent paper, which presents a new framework for complex video reasoning tasks. A straightforward and intuitiv….

0

6

0

Yan-Bo Lin

@yblin98

2 months

Check out our recent work. We leverage the strong reasoning capabilities of LLMs for complex vision-language tasks. The code is easy to adapt for newer model versions.

Ce Zhang

@cezhhh

2 months

Recent advances in test-time optimization have led to remarkable reasoning capabilities in LLMs. However, the reasoning capabilities of MLLMs still significantly lag, especially for complex video-language tasks. We present SiLVR, a Simple Language-based Video Reasoning framework.

0

5

Yan-Bo Lin

@yblin98

4 months

RT @CMHungSteven: The 4th Workshop on Transformers for Vision (T4V) at CVPR 2025 is soliciting self-nominations for reviewers. If you're in….

0

11

0

Yan-Bo Lin

@yblin98

5 months

RT @gberta227: For those of you who know me, I've always been very excited to combine my two passions for basketball and CV. Our #CVPR2025….

sites.google.com

We present BASKET, a large-scale basketball video dataset for fine-grained skill estimation. BASKET contains more than 4,400 hours of video capturing 32,232 basketball players from all over the...

0

13

0

Yan-Bo Lin

@yblin98

5 months

RT @mmiemon: 🚀New #CVPR2025 Paper🚀 .Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡.It sets SOTA on 7 VQA benchmark….

0

15

0

Yan-Bo Lin

@yblin98

6 months

RT @yilin_sung: 🚀 New Paper: RSQ: Learning from Important Tokens Leads to Better Quantized LLMs. We show that not all tokens should be trea….

0

41

0

Yan-Bo Lin

@yblin98

8 months

RT @yilin_sung: I'm on the industry job market!. If you're hiring, let's connect (I will attend #NeurIPS2024 in person too)! I'm excited to….

0

23

0

Yan-Bo Lin

@yblin98

8 months

RT @jmin__cho: 🚨 I’m on the 2024-2025 academic job market!. I work on ✨ Multimodal AI ✨, with a special focus on en….

0

43

0

Yan-Bo Lin

@yblin98

9 months

RT @cyjustinchen: 🚨 Reverse Thinking Makes LLMs Stronger Reasoners. We can often reason from a problem to a solution and also in reverse to….

0

105

0

Yan-Bo Lin

@yblin98

9 months

RT @andrewhowens: We generate a soundtrack for a silent video, given a text prompt! For example, we can make a cat's meow sound like a lion….

0

5

0

Yan-Bo Lin

@yblin98

9 months

RT @ZunWang919: 🚨Thrilled to share my first PhD project:. DreamRunner✨: Fine-Grained Storytelling Video Generation with Retrieval-Augmented….

0

42

0

Yan-Bo Lin

@yblin98

9 months

RT @jmin__cho: Modern T2V generation models still make simple mistakes, such as generating the incorrect number of objects or semantically….

0

11

0

Yan-Bo Lin

@yblin98

9 months

RT @SenguptRoni: SOTA virtual face re-aging techniques often struggle to preserve identity for large age changes. We present MyTimeMachine,….

0

4

0

Yan-Bo Lin

@yblin98

9 months

RT @jmin__cho: Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-doma….

0

90

0

Yan-Bo Lin

@yblin98

10 months

RT @hanlin_hl: Glad to share our new preprint from my Meta @AIatMeta internship and @uncnlp collaboration: . VEDiT: Latent Prediction Arc….

0

44

0