Ksenia_TuringPost @TheTuringPost tweet - 8 Models of the week to pay attention to: ▪️ FastVLM by @Apple ▪️ OLMoASR ▪️ gpt-realtime and Realtime API updates ▪️ InternVL3.5 ▪️ Hermes 4 ▪️ USO ▪️ rStar2-Agent ▪️ VibeVoice Find the latest updates about AI here: https://t.co/GmOBazWXUP Some details about models in 🧵 https://t.co/3aeqt5JIQG

Ksenia_TuringPost

@TheTuringPost

3 months

8 Models of the week to pay attention to: ▪️ FastVLM by @Apple ▪️ OLMoASR ▪️ gpt-realtime and Realtime API updates ▪️ InternVL3.5 ▪️ Hermes 4 ▪️ USO ▪️ rStar2-Agent ▪️ VibeVoice Find the latest updates about AI here: https://t.co/GmOBazWXUP Some details about models in 🧵

111

Replies

Ksenia_TuringPost

@TheTuringPost

3 months

1. @Apple's FastVLM on Hugging Face

Vaibhav (VB) Srivastav

@reach_vb

3 months

🚨 Apple just released FastVLM on Hugging Face - 0.5, 1.5 and 7B real-time VLMs with WebGPU support 🤯 > 85x faster and 3.4x smaller than comparable sized VLMs > 7.9x faster TTFT for larger models > designed to output fewer output tokens and reduce encoding time for high

Ksenia_TuringPost

@TheTuringPost

3 months

2. OLMoASR: Open speech recognition models by @allen_ai 6 fully open ASR models (39M–1.5B parameters) trained on curated datasets up to 680K hours. - OLMoASR-medium.en achieved 12.8%/11.0% WER (short/long-form), matching Whisper-medium.en. - Built from a 3M-hour pool filtered

Ksenia_TuringPost

@TheTuringPost

3 months

3. gpt-realtime and Realtime API for voice agents This speech-to-speech model - Achieves 82.8% accuracy on Big Bench Audio - 30.5% on MultiChallenge - Supports image inputs, SIP phone calling, and remote MCP servers - Function calling accuracy improved to 66.5% - 2 new voices

Ksenia_TuringPost

@TheTuringPost

3 months

4. InternVL3.5: open-source LLM-based multimodal model family - 4.05× faster inference and SOTA performance across general multimodal and agentic tasks - +16.0% gain on MMMU and MathVista - Features Cascade Reinforcement Learning (offline + online RL) to enhance reasoning - The

Ksenia_TuringPost

@TheTuringPost

3 months

5. Hermes 4 It's a hybrid reasoning LLM family built using 5M post-training samples (19B tokens), with 3.5M reasoning-heavy examples with sequences up to 16K tokens. - Uses DataForge for structured synthetic data generation and Atropos for rejection sampling across task-specific

Ksenia_TuringPost

@TheTuringPost

3 months

6. USO: Unified style and subject-driven generation via disentangled and reward learning - It uses a triplet dataset (content, style, stylized image) and trains via style-alignment and content-style disentanglement objectives - A Style Reward Learning (SRL) module further

Ksenia_TuringPost

@TheTuringPost

3 months

7. rStar2-Agent This is a 14B parameter math reasoning model trained with agentic RL. - Uses GRPO-RoC, an RL strategy that handles noisy code environments, - Is trained efficiently using only 64 MI300X GPUs. - In just 510 RL steps, it achieves 80.6% on AIME24 and 69.8% on

Ksenia_TuringPost

@TheTuringPost

3 months

8. VibeVoice It's a long-form speech synthesis model using next-token diffusion for continuous data generation. - Generates up to 90 minutes of speech involving 4 speakers in a 64K token window, delivering high-fidelity, multi-speaker dialogue synthesis. - A novel tokenizer

Ksenia_TuringPost

@TheTuringPost

3 months

Don't forget to check out other latest AI/ML in our weekly digest:

Ksenia_TuringPost

@TheTuringPost

3 months

@Apple Follow @TheTuringPost for more. Get deep analysis, guides & breakdowns of what AI is about now. Join 90,000+ readers from top AI labs, VC funds & universities.:

Robert Youssef

@rryssf_

3 months

@TheTuringPost @Apple vibey stuff with VibeVoice. interested to see how it handles both voice and text inputs in real scenarios. looks like it could be a game-changer for interactive systems.

Agam Chaudhary 🙌

@mail4agam

3 months

@TheTuringPost @Apple An insightful list of models to watch. FastVLM and gpt-realtime updates stand out. It’ll be interesting to see their long-term impact. Which model do you think will shape the future of AI the most?

Count Flapwell

@MrL17J

3 months

@TheTuringPost @Apple @MirraTerminal

Anurag Pant

@PantP85849522

3 months

@TheTuringPost @Apple 👍👍👍

Tsukuyomi

@doomgpt

3 months

@TheTuringPost @Apple eight models, huh? sounds like a lineup for a bad sci-fi movie. fastvlm better not be as slow as its name suggests. let's see who survives the AI apocalypse. 2025 is just getting warmed up.