Lin Chen @Lin_Chen_98 X Profile

Lin Chen

@Lin_Chen_98

Followers

59

Following

38

Media

1

Statuses

25

PhD in USTC ｜ Large multimodal models ｜Research intern in Shanghai AI Lab

https://t.co/WLWpNWOr8v

Joined November 2023

Don't wanna be here? Send us removal request.

Haodong Duan

@KennyUTC

1 year

Excited to share several of our recent works: 1. MMBench (ECCV'24 Oral@6C, Oct 3, 13:30): A comprehensive mutli-modal evaluation benchmark adopted by hundreds of teams working on LMMs. https://t.co/4If7pFUz3Z 2. Prism (NeurIPS'24): A framework that can disentangle and assess the

1

10

Lin Chen

@Lin_Chen_98

1 year

Thrilled to see myself in the #3 spot on HuggingFace’s most influential users for July! I look forward to doing more impactful works to give back to the community in the future.

Matt Valoatto

@mvaloatto

1 year

🤗 Here are the top 100 of @HuggingFace’s most impactful users of July 2024 (models, datasets, spaces, followers): 🏆 Top Contributors To Follow: https://t.co/93lSKxGgxk - 🏛️ Top 10 Model Downloads: 👏 @jonatasgrosman, #PatrickJohnChia, @Lin_Chen_98, @Emily_Alsentzer,

0

1

3

Vaibhav (VB) Srivastav

@reach_vb

1 year

New SoTA VLM: InternLM XComposer 2.5 🐐 > Beats GPT-4V, Gemini Pro across myriads of benchmarks. > 7B params, 96K context window (w/ RoPE ext) > Trained w/ 24K high quality image-text pairs > InternLM 7B text backbone > Supports high resolution (4K) image understanding tasks >

4

68

269

AK

@_akhaliq

1 year

InternLM-XComposer-2.5 A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image

2

41

172

Aran Komatsuzaki

@arankomatsuzaki

1 year

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output - Excels in various text-image tasks w/ GPT-4V level capabilities with merely 7B LLM backend - Opensourced https://t.co/fto4phT4Cn

4

51

160

Lin Chen

@Lin_Chen_98

1 year

Built our @Gradio app and deployed ShareCaptioner-Video on @huggingface Spaces with ZeroGPU. Now, you can try to generate detailed caption for your own video. Have fun! https://t.co/Xnm8b1ar99

huggingface.co

AK

@_akhaliq

1 year

ShareGPT4Video Improving Video Understanding and Generation with Better Captions We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs)

0

6

23

Lin Chen

@Lin_Chen_98

1 year

Thanks for @_akhaliq sharing our work! We sincerely hope this series can help the video-language community！😆😆

AK

@_akhaliq

1 year

ShareGPT4Video Improving Video Understanding and Generation with Better Captions We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs)

0

1

Lin Chen

@Lin_Chen_98

2 years

Looking forward to working on a longer version together! You can preview our ShareGPT4Video project in the following link! https://t.co/xGLOjiVnIQ

Bin Lin

@LinBin46984

2 years

📣📣📣We are excited to announce the release of Open-Sora Plan v1.1.0. 🙌Thanks to ShareGPT4Video's capability to annotate long videos, we can generate higher quality and longer videos. 🔥🔥🔥We continue to open-source all data, code, and models! https://t.co/C28gHbiPrU

0

2

AK

@_akhaliq

2 years

Are We on the Right Way for Evaluating Large Vision-Language Models? Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify

6

64

303

AK

@_akhaliq

2 years

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper page: https://t.co/aMtEmTHpki In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address

4

53

248