wjqdev Profile Banner
Jiaqi Wang Profile
Jiaqi Wang

@wjqdev

Followers
777
Following
75
Media
11
Statuses
51

Research Scientist at Shanghai AI Laboratory

Hong Kong
Joined June 2020
Don't wanna be here? Send us removal request.
@wjqdev
Jiaqi Wang
2 months
RT @HuggingPapers: Nvidia's got something new. UnifiedReward-Think is here: a multimodal CoT reward model for both visual understanding andโ€ฆ.
0
41
0
@wjqdev
Jiaqi Wang
7 months
RT @ZhibingLi_6626: ๐ŸŽ‰ Excited to introduce IDArb! ๐ŸŽ‰. Our method can predict plausible and ๐—ฐ๐—ผ๐—ป๐˜€๐—ถ๐˜€๐˜๐—ฒ๐—ป๐˜ geometry and PBR material for ๐—ฎ๐—ป๐˜† ๐—ป๐˜‚๐—บโ€ฆ.
0
25
0
@wjqdev
Jiaqi Wang
7 months
Thanks so much for tweeting our work!.
@_akhaliq
AK
7 months
InternLM-XComposer2.5-OmniLive. A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
0
1
20
@wjqdev
Jiaqi Wang
7 months
Many thanks for your tweeting ๐Ÿ‘.
@AdinaYakup
Adina Yakup
7 months
InternLM-XComposer-2.5-OmniLive๐Ÿ”ฅ a specialized generalist multimodal system for streaming video and audio interactions by @intern_lm. Model: โœจ Apache 2.0, but a form is required for a commercial license.
0
0
4
@wjqdev
Jiaqi Wang
7 months
๐Ÿš€ Weโ€™re excited to announce the release of InternLM-XComposer2.5-OmniLive (IXC2.5-OL), a comprehensive multimodal system designed for long-term streaming video and audio interactions. This fully open-sourced project delivers functionality similar to Gemini 2.0 Live Streaming and
3
35
129
@wjqdev
Jiaqi Wang
7 months
RT @liuziwei7: ๐Ÿ˜ปFine-Grained Visual Attributes for GenAI๐Ÿ˜ป. #NeurIPS2024 ๐ŸŽFiVA๐ŸŠ is a fine-grained visual attributes dataset and a frameworkโ€ฆ.
0
36
0
@wjqdev
Jiaqi Wang
9 months
We have released SAM2Long, a training-free enhancement to SAM 2 for long-term video segmentation.๐Ÿ”ฅ Less error accumulation facing occlusion/reappearance. โšก๏ธ A training-free memory tree for dynamic segmentation paths, boosting resilience efficiently. ๐Ÿคฏ Significant improvements
1
39
167
@wjqdev
Jiaqi Wang
1 year
RT @LyxTg: ๐Ÿš€Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Hiโ€ฆ.
0
13
0
@wjqdev
Jiaqi Wang
1 year
RT @mayubo2333: Large Vision-Language Models (LVLMs) perform ideally on the understanding of single-page documents like DocVQA, ChartQA.โ€ฆ.
0
16
0
@wjqdev
Jiaqi Wang
1 year
RT @Norod78: Gave InternLM-XComposer a go it recognized the activity correctly but could not count nor tell me whatโ€ฆ.
0
8
0
@wjqdev
Jiaqi Wang
1 year
RT @wjqdev: Thanks @iScienceLuvr for tweeting our work!.๐Ÿš€ We have released InternLM-XComposer-2.5 (IXC-2.5), a versatile Large Vision Languโ€ฆ.
0
7
0
@wjqdev
Jiaqi Wang
1 year
Thanks @_akhaliq for tweeting our work!.๐Ÿš€ We have released InternLM-XComposer-2.5 (IXC-2.5) on Huggingface @huggingface, a versatile Large Vision Language Model (LVLM) supporting long-contextual input and output. ๐ŸŒŠ Support 24K interleaved image-text contexts. ๐Ÿ› ๏ธ Versatile.
@_akhaliq
AK
1 year
InternLM-XComposer-2.5. A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output. We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image
1
18
75
@wjqdev
Jiaqi Wang
1 year
Thanks @iScienceLuvr for tweeting our work!.๐Ÿš€ We have released InternLM-XComposer-2.5 (IXC-2.5), a versatile Large Vision Language Model (LVLM) supporting long-contextual input and output. ๐ŸŒŠ Support 24K interleaved image-text contexts.๐Ÿ› ๏ธ Versatile applications:.- ๐ŸŽž๏ธ Video
@iScienceLuvr
Tanishq Abraham is at ICML
1 year
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output. abs: model: New open vision-language model, has long-context capabilities, ultra-high resolution understanding,
Tweet media one
0
7
33
@wjqdev
Jiaqi Wang
1 year
Thanks @arankomatsuzaki for tweeting our work!.๐Ÿš€ We have released InternLM-XComposer-2.5 (IXC-2.5), a versatile Large Vision Language Model (LVLM) supporting long-contextual input and output. ๐ŸŒŠ Support 24K interleaved image-text contexts.๐Ÿ› ๏ธ Versatile applications:.- ๐ŸŽž๏ธ Video.
@arankomatsuzaki
Aran Komatsuzaki
1 year
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output. - Excels in various text-image tasks w/ GPT-4V level capabilities with merely 7B LLM backend.- Opensourced.
Tweet media one
0
3
14