#VLMS X Hashtag | Muskviewer

Explore tweets tagged as #VLMS

KIEL

@0xkiel_

1 hour

Natix co-founder @AlirezaGhods2 on VLMs & their recently launched WorldSeek: "world-models covers what would happen in the future".

1

4

UniPat AI

@UniPat_AI

6 days

BREAKING 🚨: Kimi K2.5 is now the #1 open model on the BabyVision Benchmark, and #2 overall, trailing only Gemini-3-Pro. From 12.4% → 36.5% in 9 months — an incredible leap for VLMs. Huge congrats to the @Kimi_Moonshot team 👏🔥

1

2

6

Himanshu Shekhar

@Shekhar_is_ok

1 day

What #DeepLearning architecture is this?? #AI Currently I am revising or and going to make a good Computer Vision project (VLMs).

0

2

divyansh

@divyansh70305

9 hours

Was testing on vlms a few weeks ago and found that if we just fix the edges and the color bleeding like in these images The performance improved and hallunication drops I did not prove anything new, just was the result of me being bored a few weeks ago

0

2

Data Science Dojo

@DataScienceDojo

2 days

LLMs can reason. Vision models can see. But most real problems don’t come in one modality. That gap is exactly why Vision-Language Models (VLMs matter). This carousel breaks down how VLMs actually work under the hood and why they’ve become foundational for modern AI systems.

1

4

Sipeed

@SipeedIO

3 days

🚨 Milestone: LPDDR4x prices have skyrocketed 10X YoY and are still rising >5% weekly! 📈 Even the #RaspberryPi price hikes up to $60. This is your LAST CHANCE to grab #MaixCAM2 (runs local VLMs!) at the current price, Don't wait! https://t.co/CejlUfP6gY

3

2

24

Leonie

@helloiamleonie

8 days

How can a 400M param encoder outperform a 6B param one on most VLM benchmarks? Training methodology. ICYMI @JinaAI_ released a survey on vision encoders in VLMs at the end of last year. If you're new to VLMs like me it's a great starting point into the whole topic. Paper:

13

89

657

Jianshu Zhang

@SterZhang

5 days

Introducing 𝑃𝑅𝑂𝐺𝑅𝐸𝑆𝑆𝐿𝑀 🚀Humans can place a 𝙨𝙩𝙖𝙩𝙞𝙘 𝙨𝙘𝙚𝙣𝙚 into a 𝙙𝙮𝙣𝙖𝙢𝙞𝙘 𝙩𝙖𝙨𝙠 𝙘𝙤𝙣𝙩𝙚𝙭𝙩, inferring task progress from one observation. Can VLMs do the same—and if not, how possible can VLMs get there? Check it out 👇🔗 https://t.co/KFfy5Zk6OW

2

1

7

Huaizu Jiang

@HuaizuJiang

2 days

Excited to share our latest #TMLR paper: "SocialFusion"! We found something surprising: VLM pre-training actually hurts social understanding. Popular VLMs struggle to jointly learn social tasks like gaze, gestures, expressions & relationships, showing negative transfer. We call

1

0

15

AVB

@neural_avb

4 days

Deepseek recently published DeepSeek-OCR 2. There is a cool genius-level intuition behind this paper. "What if you train the image encoder to REORDER the image tokens before processing?" - Most VLMs extract patches from an image and present to the LM in a fixed ordering - i.e.

2

7

114

Chain Loader

@Chain_Loader

1 day

🚨 Find "P" 🚨 @yupp_ai How good is AI vision? I tested 10 VLMs on a busy B&W street scene: find every object/person/action/attribute starting with "P". The result? A massive gap between "Thinking" models and the rest. Top scorers: Gemini 2.5 Flash Thinking & Magistral Medium. 👇

2

1

5

Fangxu Yu

@nerv_599164778

9 days

🤖Can generalist models (LLMs/VLMs) also perform complex reasoning over time series data?🤔 🚀Introducing TSRBench📈, a comprehensive benchmark for evaluating the full spectrum of time series reasoning capabilities. 🌍Scalable & Diverse,🧠Multimodal support,🎯 Easy & Automated

1

8

claire night skies🪼

@clairebookworm1

9 days

at @dimensionalos we now have spatio-temporal memory! using vlms/our agents, robots can now understand causal & semantic object relationships over time. robots in physical space ingest hours of video/lidar, and we can use that to provide human-like understanding of the world

7

79

stash

@stash_pomichter

9 days

Announcing Temporal-Spatial Agents on Dimensional. VLMs now understand casual and interactive object relationships over time and physical vectors. Robots in physical space ingest hours of video & lidar. Dimensional provides a human-like understanding of the world. More below

7

11

104

Alex Aridgides

@AlexAridgides

6 days

DAY 2 of Side Project Week: VLMs still have a LONG way to go and fundamentally struggle with positional awareness. The idea was simple: have Gemini recognize the board and play optimal moves with a Stockfish API. Every second I would poll Gemini saying "Has the board state

1

0

10

CNX Software

@cnxsoft

6 days

Review of @SunFounder2 Fusion HAT+ board adding voice assistant and servo/motor control to Raspberry Pi SBCs https://t.co/ixPm7T9s7J In this review, we mainly followed the company's documentation to experiment with text-to-speech, speech-to-text, local LLMs/VLMs with Ollama,

0

9

84

Wassim

@wassgha

3 days

Giving a talk today on Agentic Diagnosis of Time Series Data @ the San Francisco AI Engineers meetup. Learn how VLMs can be used in production to assist with multi-variate anomaly detection and diagnosis. If you’re in the bay, come join us https://t.co/iNdX70de8r

0

2

Jaden Park

@_jadenpark

5 days

Excited to share that our work on detecting data contamination in VLMs has been accepted to #ICLR2026! In v2 of our paper, we add - Detecting contamination with paraphrased data. - Detecting contamination in free-form QA. To learn more: https://t.co/RtybGkLOOU See you in Rio🇧🇷

Jaden Park

@_jadenpark

3 months

Me: memorize past exams 📚💯 Also me: fail on a slight tweak 🤦‍♂️🤦‍♂️ Turns out, we can use the same method to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗩𝗟𝗠𝘀! 🧵(1/10) - Project Page: https://t.co/ue1GybD4fm

0

16

Daniil

@denpoint1

3 days

Brewing coffee and watching new podcast @GJarrosson!!! - YC-backed Cerrion CEO Karim Saleh - Why YC keeps funding industrial computer vision (even across hype cycles) - The technical truth: every factory is different and why VLMs change the game - The go-to-market wedge that

2

0

6