Salman Khan @KhanSalmanH X Profile

Salman Khan

@KhanSalmanH

Followers

1K

Following

2K

Media

106

Statuses

351

Faculty at MBZUAI. Past: Inception, ANU, Data61, NICTA, UWA.

Canberra, Australia

Joined February 2011

Don't wanna be here? Send us removal request.

Salman Khan

@KhanSalmanH

1 day

RT @Dr_HammadKhan: 🚨Do you have expertise in #RemoteSensing and #MachineLearning with a passion for #AgTech? Do you want to lead the develo….

0

4

0

Salman Khan

@KhanSalmanH

1 month

RT @alex_lacoste_: [#ICCV2025] Our paper "GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks" is accepted at ICCV 2025….

0

1

0

Salman Khan

@KhanSalmanH

2 months

🛰️ Benchmarked on GEO-Bench & Copernicus-Bench across classification, segmentation, and landslide mapping — with state-of-the-art performance. We hope the community will build on this work to further improve. Thanks to the wonderful team!. 4/4

0

Salman Khan

@KhanSalmanH

2 months

🖥️ Model Key Highlights:.* TerraFM uses modality-specific patch embeddings, .* Aligned sensor modalities as natural augmentations, .* Incorporates cross-attention modality fusion for feature learning, .* Introduces a dual-centering loss to learn balanced, scalable features. 3/4

1

0

Salman Khan

@KhanSalmanH

2 months

📦 Scale matters.• Trained on 18.7 million tiles of spatially aligned SAR and Sentinel-2 (L1C & L2A) data.• With large-sized tiles of 534×534 pixels, we pretrain the model on 23 trillion pixels!!.• Data sampled globally across diverse land cover types. 2/4

1

0

Salman Khan

@KhanSalmanH

2 months

🌍 TerraFM: A Scalable Foundation Model for Multisensor Earth Observation. Excited to share TerraFM, a unified foundation model trained to understand our planet through SAR and optical satellite data. 📄 Paper: 💻 Code: 1/4

1

0

2

Salman Khan

@KhanSalmanH

2 months

🔗 Paper: 🔬 Inference Code: 📌 Project Page: 💻 Training code coming soon!. Joint work with: Ghazi Shazan Ahmad, Ahmed Heakl, Hanan Ghani, Abdelrahman Shaker, Zhiqiang Shen, Ranjay Krishna, Fahad Khan.

github.com

Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing" - mbzuai-oryx/VideoMolmo

0

2

Salman Khan

@KhanSalmanH

2 months

💡 Key Highlights on Data:.📊 New dataset: 72K video-caption pairs, 100K+ point labels.🧪 VPoS-Bench: Spans five real-world domains: Cell Tracking, Egocentric Vision, Autonomous Driving, Video-GUI Interaction, and Robotics. We also eval on Referring and Reasoning VOS tasks. 3/4

1

0

Salman Khan

@KhanSalmanH

2 months

🗺️ Building on the Molmo framework, VideoMolmo is the first LMM tailored for fine-grained spatio-temporal pointing in videos, grounded in language prompts! . ✨Temporal attention module enables per-frame conditioning on prior context & Two-step grounding performs better. 2/4

1

0

Salman Khan

@KhanSalmanH

2 months

🎥 From robotics to autonomous driving and cell tracking, knowing what, where, and when objects appear is key. VideoMolmo tackles this by first pointing to objects in space-time using language, then precisely segmenting them. 1/4

1

14

Salman Khan

@KhanSalmanH

2 months

🌐 Project page: 📄 Paper: 🤗 Dataset: 💻 Code: Thanks to the amazing team: Hanoona Rasheed, Abdel-Rahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang, Salman Khan, Fahad Khan

0

Salman Khan

@KhanSalmanH

2 months

Early results 👉 .•Models stumble as they miss a key frame or spoken detail. •Size helps, but architecture & training details matter more.•New open models now rival proprietary ones. 3/4

1

0

Salman Khan

@KhanSalmanH

2 months

VideoMathQA: a benchmark that tests LLMs on educational videos rather than static screenshots or text-based math problems. Challenges:.•Dynamic visuals (changing diagrams, handwriting).•Long-range temporal reasoning.•Crossmodal grounding of video 🎥, audio 🎙️, text 📝. 2/4

1

0

Salman Khan

@KhanSalmanH

2 months

⚡Can LLMs watch a math lesson and reason like we do?. That means following evolving diagrams, spoken explanations, understanding symbols & logic, connecting them with subject knowledge, and handwritten notes — all at once. We try to answer it via the VideoMathQA benchmark. 1/4

1

4

Salman Khan

@KhanSalmanH

2 months

RT @vidllms: 🚀 Heading to #CVPR2025 in Nashville? Don’t miss the VideoLLMs Workshop on June 11 (Grand A1)!.• 🔑 6 keynotes.• 🗣️ Panel “Video….

0

8

0

Salman Khan

@KhanSalmanH

2 months

RT @SameeraRamasin1: In model parallel training, compressing the signal flow has been established as not very useful as they are too inform….

0

5

0

Salman Khan

@KhanSalmanH

2 months

RT @PluralisHQ: We've reached a major milestone in fully decentralized training: for the first time, we've demonstrated that a large langua….

0

267

0

Salman Khan

@KhanSalmanH

2 months

🚀 Introducing ThinkGeo: a benchmark for tool-augmented LLMs on real-world remote sensing tasks! 🌍🛰️ 436 tasks across key domains; 14 tools; step-by-step SOTA model eval. Open-source! 🔗#GeospatialAI

0

3

Salman Khan

@KhanSalmanH

2 months

🌐 Links:. • Paper: • Code: • Data:

0

2

Salman Khan

@KhanSalmanH

2 months

We also specify the type of errors in the existing models' outputs. 👥 Thanks to the Team!.Tajamul Ashraf · Amal Saqib · Hanan Ghani · Muhra AlMahri · Yuhao Li · Noor Ahsan · Umair Nawaz · Jean Lahoud · Hisham Cholakkal · Mubarak Shah · Philip Torr · Fahad Khan · Anwer Rao. 4/4

0