Explore tweets tagged as #VLMS
Natix co-founder @AlirezaGhods2 on VLMs & their recently launched WorldSeek: "world-models covers what would happen in the future".
1
1
4
BREAKING ๐จ: Kimi K2.5 is now the #1 open model on the BabyVision Benchmark, and #2 overall, trailing only Gemini-3-Pro. From 12.4% โ 36.5% in 9 months โ an incredible leap for VLMs. Huge congrats to the @Kimi_Moonshot team ๐๐ฅ
1
2
6
What #DeepLearning architecture is this?? #AI Currently I am revising or and going to make a good Computer Vision project (VLMs).
0
0
2
Was testing on vlms a few weeks ago and found that if we just fix the edges and the color bleeding like in these images The performance improved and hallunication drops I did not prove anything new, just was the result of me being bored a few weeks ago
0
0
2
LLMs can reason. Vision models can see. But most real problems donโt come in one modality. That gap is exactly why Vision-Language Models (VLMs matter). This carousel breaks down how VLMs actually work under the hood and why theyโve become foundational for modern AI systems.
1
1
4
๐จ Milestone: LPDDR4x prices have skyrocketed 10X YoY and are still rising >5% weekly! ๐ Even the #RaspberryPi price hikes up to $60. This is your LAST CHANCE to grab #MaixCAM2 (runs local VLMs!) at the current price, Don't wait! https://t.co/CejlUfP6gY
3
2
24
How can a 400M param encoder outperform a 6B param one on most VLM benchmarks? Training methodology. ICYMI @JinaAI_ released a survey on vision encoders in VLMs at the end of last year. If you're new to VLMs like me it's a great starting point into the whole topic. Paper:
13
89
657
Introducing ๐๐
๐๐บ๐
๐ธ๐๐๐ฟ๐ ๐Humans can place a ๐จ๐ฉ๐๐ฉ๐๐ ๐จ๐๐๐ฃ๐ into a ๐๐ฎ๐ฃ๐๐ข๐๐ ๐ฉ๐๐จ๐ ๐๐ค๐ฃ๐ฉ๐๐ญ๐ฉ, inferring task progress from one observation. Can VLMs do the sameโand if not, how possible can VLMs get there? Check it out ๐๐ https://t.co/KFfy5Zk6OW
2
1
7
Excited to share our latest #TMLR paper: "SocialFusion"! We found something surprising: VLM pre-training actually hurts social understanding. Popular VLMs struggle to jointly learn social tasks like gaze, gestures, expressions & relationships, showing negative transfer. We call
1
0
15
Deepseek recently published DeepSeek-OCR 2. There is a cool genius-level intuition behind this paper. "What if you train the image encoder to REORDER the image tokens before processing?" - Most VLMs extract patches from an image and present to the LM in a fixed ordering - i.e.
2
7
114
๐จ Find "P" ๐จ @yupp_ai How good is AI vision? I tested 10 VLMs on a busy B&W street scene: find every object/person/action/attribute starting with "P". The result? A massive gap between "Thinking" models and the rest. Top scorers: Gemini 2.5 Flash Thinking & Magistral Medium. ๐
2
1
5
๐คCan generalist models (LLMs/VLMs) also perform complex reasoning over time series data?๐ค ๐Introducing TSRBench๐, a comprehensive benchmark for evaluating the full spectrum of time series reasoning capabilities. ๐Scalable & Diverse,๐ง Multimodal support,๐ฏ Easy & Automated
1
1
8
at @dimensionalos we now have spatio-temporal memory! using vlms/our agents, robots can now understand causal & semantic object relationships over time. robots in physical space ingest hours of video/lidar, and we can use that to provide human-like understanding of the world
7
7
79
Announcing Temporal-Spatial Agents on Dimensional. VLMs now understand casual and interactive object relationships over time and physical vectors. Robots in physical space ingest hours of video & lidar. Dimensional provides a human-like understanding of the world. More below
7
11
104
DAY 2 of Side Project Week: VLMs still have a LONG way to go and fundamentally struggle with positional awareness. The idea was simple: have Gemini recognize the board and play optimal moves with a Stockfish API. Every second I would poll Gemini saying "Has the board state
1
0
10
Review of @SunFounder2 Fusion HAT+ board adding voice assistant and servo/motor control to Raspberry Pi SBCs https://t.co/ixPm7T9s7J In this review, we mainly followed the company's documentation to experiment with text-to-speech, speech-to-text, local LLMs/VLMs with Ollama,
0
9
84
Giving a talk today on Agentic Diagnosis of Time Series Data @ the San Francisco AI Engineers meetup. Learn how VLMs can be used in production to assist with multi-variate anomaly detection and diagnosis. If youโre in the bay, come join us https://t.co/iNdX70de8r
0
0
2
Excited to share that our work on detecting data contamination in VLMs has been accepted to #ICLR2026! In v2 of our paper, we add - Detecting contamination with paraphrased data. - Detecting contamination in free-form QA. To learn more: https://t.co/RtybGkLOOU See you in Rio๐ง๐ท
Me: memorize past exams ๐๐ฏ Also me: fail on a slight tweak ๐คฆโโ๏ธ๐คฆโโ๏ธ Turns out, we can use the same method to ๐ฑ๐ฒ๐๐ฒ๐ฐ๐ ๐ฐ๐ผ๐ป๐๐ฎ๐บ๐ถ๐ป๐ฎ๐๐ฒ๐ฑ ๐ฉ๐๐ ๐! ๐งต(1/10) - Project Page: https://t.co/ue1GybD4fm
0
0
16
Brewing coffee and watching new podcast @GJarrosson!!! - YC-backed Cerrion CEO Karim Saleh - Why YC keeps funding industrial computer vision (even across hype cycles) - The technical truth: every factory is different and why VLMs change the game - The go-to-market wedge that
2
0
6