Torsten Scholak
@tscholak
Followers
2K
Following
52K
Media
177
Statuses
4K
Lead Research Scientist, Foundation Models Lab @ServiceNowRSRCH. Opinions are not that of my employer.
Montréal
Joined February 2010
A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV
tscholak.github.io
If your models think slowly, your roadmap does too. This essay argues that efficient attention is the hidden control knob for margins, RL capacity, and agent reliability, and shows how Apriel-H1...
0
1
3
Congratulations to the team at @ServiceNowRSRCH - @SathwikTejaswi, @sagardavasam, @tscholak Further analysis on Artificial Analysis: https://t.co/BBCVpxlu4B HuggingFace 🤗 repo: https://t.co/fG8aToZktf ServiceNow’s blog post: https://t.co/Fq767Bs6DC
0
2
13
1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: https://t.co/GE22SOIBfT 📄Blog:
9
54
210
With this latest artifacts log roundup of the best open models, I included the list of serious open model builders in the U.S. These 13 are making models way smaller than Chinese competition, and with often worse licenses. We'll be improving this for an update to the ATOM
Latest open artifacts (#16): Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models A month with SOTA releases with (truly) open model releases left and right. https://t.co/lVhmIZBZGT
14
35
230
It’s great to see that the MambaInLlama approach has been to scale up. check this blogpost:
huggingface.co
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation
0
1
1
🙏 Huge thanks to the team behind Apriel-H1 at ServiceNow AI SLAM Lab: @ostap__alex, Luke Kumar, @Ray97369304, Denis Kocetkov, and @jlamypoirier with contributions from Shruthan Radhakrishna, @sohampar, and Shambhavi Mishra More efficient reasoning models coming soon. Stay
0
1
4
💼 Why enterprises care: ⚡ More concurrent users 🕓 Lower latency 🧠 Higher test-time compute budgets 🤖 More affordable RL finetuning ♻️ Reduced compute footprint
1
0
2
The result? 🚦 Smooth performance vs throughput trade-off 📈 Efficient variants from H1-25 to H1-40 🥇 Best variant hits 3.4× throughput 🏆 H1-30-SFT delivers 2× throughput with negligible performance drop.
1
0
2
We introduce two techniques to decide which attention layers to replace: 🔍 LOO (Leave-One-Out) importance 📉 MMR (MIL-Mamba-Replacement) loss ranking These ensure we only replace low-impact reasoning layers.
1
0
3
So Apriel-H1 does something else: 🧠 Directly distill an existing reasoning-ready Transformer ➡ progressively replace less-critical attention layers with Mamba mixers. ➡ fine-tune only as needed. No training from scratch. No architecture gamble.
1
0
3
Mamba state-space mixers solve this with: 🔁 constant memory footprint 📈 linear time 🔎 competitive quality when used in the right layers. But full hybrid pretraining is expensive and risky.
1
0
2
Why do we need hybrid (Transformer + SSM) reasoning models? Because transformers scale quadratically with attention and require growing KV cache at inference. This kills throughput at scale, especially for long reasoning traces, limits test time scaling and makes RL post-training
1
0
5
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation
5
34
109
I wrote up my thoughts on the strategic value of efficient attention hybrids last night. Hope it makes for a good morning read.
A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV
0
1
7
This is one of the most important research coming out of ServiceNow. Make sure to take a look!
🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread
0
1
7
to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
6
34
226
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
12
60
475
🚀 Excited to share our EMNLP 2025 Industry Track paper: ColMate 📑 A new framework that introduces a novel late-interaction relevance score, masked OCR language modeling objective, self-supervised contrastive learning, boosting ColPali on multimodal document retrieval! 🧵 👇
1
2
6
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
1
29
141