Explore tweets tagged as #VLMS
@0xkiel_
KIEL
1 hour
Natix co-founder @AlirezaGhods2 on VLMs & their recently launched WorldSeek: "world-models covers what would happen in the future".
1
1
4
@UniPat_AI
UniPat AI
6 days
BREAKING ๐Ÿšจ: Kimi K2.5 is now the #1 open model on the BabyVision Benchmark, and #2 overall, trailing only Gemini-3-Pro. From 12.4% โ†’ 36.5% in 9 months โ€” an incredible leap for VLMs. Huge congrats to the @Kimi_Moonshot team ๐Ÿ‘๐Ÿ”ฅ
1
2
6
@Shekhar_is_ok
Himanshu Shekhar
1 day
What #DeepLearning architecture is this?? #AI Currently I am revising or and going to make a good Computer Vision project (VLMs).
0
0
2
@divyansh70305
divyansh
9 hours
Was testing on vlms a few weeks ago and found that if we just fix the edges and the color bleeding like in these images The performance improved and hallunication drops I did not prove anything new, just was the result of me being bored a few weeks ago
0
0
2
@DataScienceDojo
Data Science Dojo
2 days
LLMs can reason. Vision models can see. But most real problems donโ€™t come in one modality. That gap is exactly why Vision-Language Models (VLMs matter). This carousel breaks down how VLMs actually work under the hood and why theyโ€™ve become foundational for modern AI systems.
1
1
4
@SipeedIO
Sipeed
3 days
๐Ÿšจ Milestone: LPDDR4x prices have skyrocketed 10X YoY and are still rising >5% weekly! ๐Ÿ“ˆ Even the #RaspberryPi price hikes up to $60. This is your LAST CHANCE to grab #MaixCAM2 (runs local VLMs!) at the current price, Don't wait! https://t.co/CejlUfP6gY
3
2
24
@helloiamleonie
Leonie
8 days
How can a 400M param encoder outperform a 6B param one on most VLM benchmarks? Training methodology. ICYMI @JinaAI_ released a survey on vision encoders in VLMs at the end of last year. If you're new to VLMs like me it's a great starting point into the whole topic. Paper:
13
89
657
@SterZhang
Jianshu Zhang
5 days
Introducing ๐‘ƒ๐‘…๐‘‚๐บ๐‘…๐ธ๐‘†๐‘†๐ฟ๐‘€ ๐Ÿš€Humans can place a ๐™จ๐™ฉ๐™–๐™ฉ๐™ž๐™˜ ๐™จ๐™˜๐™š๐™ฃ๐™š into a ๐™™๐™ฎ๐™ฃ๐™–๐™ข๐™ž๐™˜ ๐™ฉ๐™–๐™จ๐™  ๐™˜๐™ค๐™ฃ๐™ฉ๐™š๐™ญ๐™ฉ, inferring task progress from one observation. Can VLMs do the sameโ€”and if not, how possible can VLMs get there? Check it out ๐Ÿ‘‡๐Ÿ”— https://t.co/KFfy5Zk6OW
2
1
7
@HuaizuJiang
Huaizu Jiang
2 days
Excited to share our latest #TMLR paper: "SocialFusion"! We found something surprising: VLM pre-training actually hurts social understanding. Popular VLMs struggle to jointly learn social tasks like gaze, gestures, expressions & relationships, showing negative transfer. We call
1
0
15
@neural_avb
AVB
4 days
Deepseek recently published DeepSeek-OCR 2. There is a cool genius-level intuition behind this paper. "What if you train the image encoder to REORDER the image tokens before processing?" - Most VLMs extract patches from an image and present to the LM in a fixed ordering - i.e.
2
7
114
@Chain_Loader
Chain Loader
1 day
๐Ÿšจ Find "P" ๐Ÿšจ @yupp_ai How good is AI vision? I tested 10 VLMs on a busy B&W street scene: find every object/person/action/attribute starting with "P". The result? A massive gap between "Thinking" models and the rest. Top scorers: Gemini 2.5 Flash Thinking & Magistral Medium. ๐Ÿ‘‡
2
1
5
@nerv_599164778
Fangxu Yu
9 days
๐Ÿค–Can generalist models (LLMs/VLMs) also perform complex reasoning over time series data?๐Ÿค” ๐Ÿš€Introducing TSRBench๐Ÿ“ˆ, a comprehensive benchmark for evaluating the full spectrum of time series reasoning capabilities. ๐ŸŒScalable & Diverse,๐Ÿง Multimodal support,๐ŸŽฏ Easy & Automated
1
1
8
@clairebookworm1
claire night skies๐Ÿชผ
9 days
at @dimensionalos we now have spatio-temporal memory! using vlms/our agents, robots can now understand causal & semantic object relationships over time. robots in physical space ingest hours of video/lidar, and we can use that to provide human-like understanding of the world
7
7
79
@stash_pomichter
stash
9 days
Announcing Temporal-Spatial Agents on Dimensional. VLMs now understand casual and interactive object relationships over time and physical vectors. Robots in physical space ingest hours of video & lidar. Dimensional provides a human-like understanding of the world. More below
7
11
104
@AlexAridgides
Alex Aridgides
6 days
DAY 2 of Side Project Week: VLMs still have a LONG way to go and fundamentally struggle with positional awareness. The idea was simple: have Gemini recognize the board and play optimal moves with a Stockfish API. Every second I would poll Gemini saying "Has the board state
1
0
10
@cnxsoft
CNX Software
6 days
Review of @SunFounder2 Fusion HAT+ board adding voice assistant and servo/motor control to Raspberry Pi SBCs https://t.co/ixPm7T9s7J In this review, we mainly followed the company's documentation to experiment with text-to-speech, speech-to-text, local LLMs/VLMs with Ollama,
0
9
84
@wassgha
Wassim
3 days
Giving a talk today on Agentic Diagnosis of Time Series Data @ the San Francisco AI Engineers meetup. Learn how VLMs can be used in production to assist with multi-variate anomaly detection and diagnosis. If youโ€™re in the bay, come join us https://t.co/iNdX70de8r
0
0
2
@_jadenpark
Jaden Park
5 days
Excited to share that our work on detecting data contamination in VLMs has been accepted to #ICLR2026! In v2 of our paper, we add - Detecting contamination with paraphrased data. - Detecting contamination in free-form QA. To learn more: https://t.co/RtybGkLOOU See you in Rio๐Ÿ‡ง๐Ÿ‡ท
@_jadenpark
Jaden Park
3 months
Me: memorize past exams ๐Ÿ“š๐Ÿ’ฏ Also me: fail on a slight tweak ๐Ÿคฆโ€โ™‚๏ธ๐Ÿคฆโ€โ™‚๏ธ Turns out, we can use the same method to ๐—ฑ๐—ฒ๐˜๐—ฒ๐—ฐ๐˜ ๐—ฐ๐—ผ๐—ป๐˜๐—ฎ๐—บ๐—ถ๐—ป๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฉ๐—Ÿ๐— ๐˜€! ๐Ÿงต(1/10) - Project Page: https://t.co/ue1GybD4fm
0
0
16
@denpoint1
Daniil
3 days
Brewing coffee and watching new podcast @GJarrosson!!! - YC-backed Cerrion CEO Karim Saleh - Why YC keeps funding industrial computer vision (even across hype cycles) - The technical truth: every factory is different and why VLMs change the game - The go-to-market wedge that
2
0
6