orr_zohar Profile Banner
Orr Zohar Profile
Orr Zohar

@orr_zohar

Followers
521
Following
229
Media
20
Statuses
126

@nvidia • @Stanford • @KnightHennessy scholar • Researching large multimodal models

Joined May 2023
Don't wanna be here? Send us removal request.
@orr_zohar
Orr Zohar
2 months
FineVision is out 🚀 And ready to empower the next generation of LMMs. Check it out!
@lusxvr
Luis
2 months
Today, we are releasing FineVision, a huge open-source dataset for training state-of-the-art Vision-Language Models: > 17.3M images > 24.3M samples > 88.9M turns > 9.5B answer tokens Here are my favourite findings:
0
4
20
@orr_zohar
Orr Zohar
2 days
People underestimate how token-heavy video understanding & generation are - and their massive untapped potential. We grasp images in a split second. Videos? Not so much: Comprehending a 1-hour clip often demands watching the whole thing. Video AI is a tiny sliver of tokens today,
@elonmusk
Elon Musk
3 days
@StefanoErmon @_inception_ai Diffusion will obviously work on any bitstream. With text, since humans read from first word to last, there is just the question of whether the delay to first sentence for diffusion is worth it. That said, the vast majority of AI workload will be video understanding and
1
0
6
@ericzelikman
Eric Zelikman
8 days
we think humanity’s biggest challenges won’t be solved by ai thinking for 1000 hours coming back with an answer they’ll be solved by many collaborating humans, and ai that understands them and their different skills, goals, values, etc to empower them to do more together
@ericzelikman
Eric Zelikman
2 months
some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!
54
41
672
@edgeaiguy
Latent Kiri
27 days
Started the early release of Vision Language Models (O’Reilly). My first VLM book—already on chapter 3 and loving it. Clear and easy to follow. Great work @mervenoyann , @andimarafioti ,@micuelll & @orr_zohar! Looking forward to the remaining chapters.
3
3
31
@leoyerrrr
HanRong YE
12 days
And we at #NVIDIA Research are still seeking research interns to explore omni-modal LLMs across a variety of domains, including robotics (VLA), visual agentic tool using, world modeling, and unified understanding and generation. Drop me an email if you are interested!
@leoyerrrr
HanRong YE
21 days
OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a
0
1
12
@orr_zohar
Orr Zohar
12 days
Encoder-Decoder models — making a comeback? 🤔 Discrete diffusion LMs can accelerate text generation (similar to speculative decoding). Exciting to see where these models go, and how many of the traditional AR design decisions will hold.
@mariannearr
Marianne Arriola
12 days
🚨In our NeurIPS paper, we bring encoder-decoders back.. for diffusion language models! ⚡️Encoder-decoders make diffusion sampling fast: a small (fast) decoder denoises tokens progressively and a large (slower) encoder represents clean context.
0
0
6
@orr_zohar
Orr Zohar
12 days
🚨Huge for multimodal/vision AI: Datasets hit 100s of TB, making on-prem storage a nightmare. 🤗Now stream them directly from Hugging Face to GPUs - unlocking scalable training of everything from vlms to world models. 🚀 I've battled storage limits for years; thrilled to move
@andimarafioti
Andi Marafioti
14 days
You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?
1
10
70
@andimarafioti
Andi Marafioti
20 days
Open data is the foundation of open science. FineVision is our step toward making VLM research transparent, reproducible, and actually open. You can find more HF daily papers: https://t.co/fkrvVscUUe And a big shoutout to the first authors: @lusxvr and @orr_zohar ! Titans!
1
3
9
@NadavTimor
Nadav Timor
27 days
NYC open-source AI infra contributors — we’ve launched a community research hub above Grand Central where GPUs go brrr 🔥🗽 A place to hack, benchmark, and collaborate — vLLM, SGLang, kernels, inference optimizations all welcome. Open space. Open source. Weekends too. Huge
7
10
89
@XiaohanWang96
Xiaohan Wang
1 month
🚀 Excited to release SciVideoBench — a new benchmark that pushes Video-LMMs to think like scientists! Designed to probe video reasoning and the synergy between accurate perception, expert knowledge, and logical inference. 1,000 research-level Qs across Physics, Chemistry,
@shoubin621
Shoubin Yu @ EMNLP
1 month
🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning! 🔬SciVideoBench: 1. Spans Physics, Chemistry, Biology & Medicine with authentic experimental videos. 2. Features 1,000 challenging MCQs across three reasoning types:
0
7
16
@PavloMolchanov
Pavlo Molchanov
1 month
🚀 Excited to share that our work is featured in the State of AI Report! Check it out - lots of interesting insights. Our research on the potential of small language models (SLMs) for Agentic AI is highlighted at slide 82: - Tasks like form filling, using a calculator, creating
Tweet card summary image
stateof.ai
The State of AI Report analyses the most interesting developments in AI. Read and download here.
@nathanbenaich
Nathan Benaich
1 month
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
1
1
10
@giffmana
Lucas Beyer (bl16)
2 months
Say "NO!" to filters:
@andimarafioti
Andi Marafioti
2 months
Here's a wild finding from our ablations: filtering for only the "highest-quality" data actually hurts performance! 🤯 Our experiments show that at this scale, training on the full, diverse dataset—even with lower-rated samples—is better. Don't throw away your data!
20
8
203
@mervenoyann
merve
3 months
we just released another chapter on O'Reilly for our vision language model book about modern VLM architectures ✨ @andimarafioti @micuelll @orr_zohar it covers early & late fusion, encoder-decoders, multimodal attention types and more! 🤗
6
22
242
@_TobiasLee
Lei Li@EMNLP25
3 months
🚀 MiMo‑VL 2508 is live! Same size, much smarter. We’ve upgraded performance, thinking control, and overall user experience. 📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board. 🤖 Thinking Control: toggle reasoning with
2
16
91
@shizhediao
Shizhe Diao
3 months
🚀 How far can RL scaling take LLMs? Drop ProRLv2! 🔥With ProRLv2, we keep expanding LLM’s reasoning boundaries through 3,000+ RL steps over 5 domains and set a new state-of-the-art 🌟 among 1.5B reasoning models. 🔗 Full blog: https://t.co/mxpaVXZdjj 🤗Open model:
4
40
209
@XiaohanWang96
Xiaohan Wang
4 months
🧠 How can we truly test long-context video understanding in video-LMMs? ⏱️ TimeScope benchmarks models from 1 min to 8 hours using “needle-in-a-haystack” probes. 🚀 Gemini 2.5-Pro leads the pack—but even it struggles as context length grows. Long-range memory is still a
@orr_zohar
Orr Zohar
4 months
🧵 Introducing TimeScope, an open-source benchmark rigorously evaluating the true “temporal context window” of video-language models on videos ranging from 1 minute to 8 hours. #AI #MachineLearning
1
1
10
@_TobiasLee
Lei Li@EMNLP25
4 months
Thrilled to announce our MiMo-VL series hit 100K downloads on HuggingFace last month! 🚀🚀 Incredible to see the community's enthusiasm for our VLMs. More exciting updates coming soon! 😜 https://t.co/7NhlMds1A5
2
18
69
@mervenoyann
merve
4 months
timescope: testing if large models understand long videos or they just claim to do so 🤠 they randomly insert needles (short videos/static images) in long videos and ask questions about the needle itself 🤯 Gemini seems to be the best! very cool work by @orr_zohar et al 👏
3
10
118
@orr_zohar
Orr Zohar
4 months
Big thanks for @huggingface, and my amazing collborators: Rui Li, @XiaohanWang96, and @andimarafioti For more details, check out: 📑Blog post: https://t.co/NBxrigForI ⚖️Leaderboard: https://t.co/krDXab4Gd9 🔥Dataset: https://t.co/kIOqMQRr5j
0
0
5
@orr_zohar
Orr Zohar
4 months
🚀Evaluations expose significant gaps even in leading models like Gemini 2.5-Pro, highlighting challenges in tasks requiring detailed motion analysis and information synthesis.
1
0
4