tscholak Profile Banner
Torsten Scholak Profile
Torsten Scholak

@tscholak

Followers
2K
Following
52K
Media
177
Statuses
4K

Lead Research Scientist, Foundation Models Lab @ServiceNowRSRCH. Opinions are not that of my employer.

Montréal
Joined February 2010
Don't wanna be here? Send us removal request.
@tscholak
Torsten Scholak
23 days
A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV
tscholak.github.io
If your models think slowly, your roadmap does too. This essay argues that efficient attention is the hidden control knob for margins, RL capacity, and agent reliability, and shows how Apriel-H1...
0
1
3
@ArtificialAnlys
Artificial Analysis
3 days
Congratulations to the team at @ServiceNowRSRCH - @SathwikTejaswi, @sagardavasam, @tscholak Further analysis on Artificial Analysis: https://t.co/BBCVpxlu4B HuggingFace 🤗 repo: https://t.co/fG8aToZktf ServiceNow’s blog post: https://t.co/Fq767Bs6DC
0
2
13
@ServiceNowRSRCH
ServiceNow AI Research
3 days
1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: https://t.co/GE22SOIBfT 📄Blog:
9
54
210
@natolambert
Nathan Lambert
19 days
With this latest artifacts log roundup of the best open models, I included the list of serious open model builders in the U.S. These 13 are making models way smaller than Chinese competition, and with often worse licenses. We'll be improving this for an update to the ATOM
@interconnectsai
Interconnects
19 days
Latest open artifacts (#16): Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models A month with SOTA releases with (truly) open model releases left and right. https://t.co/lVhmIZBZGT
14
35
230
@_junxiong_wang
Junxiong Wang
21 days
It’s great to see that the MambaInLlama approach has been to scale up. check this blogpost:
Tweet card summary image
huggingface.co
@tscholak
Torsten Scholak
22 days
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation
0
1
1
@tscholak
Torsten Scholak
22 days
🙏 Huge thanks to the team behind Apriel-H1 at ServiceNow AI SLAM Lab: @ostap__alex, Luke Kumar, @Ray97369304, Denis Kocetkov, and @jlamypoirier with contributions from Shruthan Radhakrishna, @sohampar, and Shambhavi Mishra More efficient reasoning models coming soon. Stay
0
1
4
@tscholak
Torsten Scholak
22 days
💼 Why enterprises care: ⚡ More concurrent users 🕓 Lower latency 🧠 Higher test-time compute budgets 🤖 More affordable RL finetuning ♻️ Reduced compute footprint
1
0
2
@tscholak
Torsten Scholak
22 days
The result? 🚦 Smooth performance vs throughput trade-off 📈 Efficient variants from H1-25 to H1-40 🥇 Best variant hits 3.4× throughput 🏆 H1-30-SFT delivers 2× throughput with negligible performance drop.
1
0
2
@tscholak
Torsten Scholak
22 days
We introduce two techniques to decide which attention layers to replace: 🔍 LOO (Leave-One-Out) importance 📉 MMR (MIL-Mamba-Replacement) loss ranking These ensure we only replace low-impact reasoning layers.
1
0
3
@tscholak
Torsten Scholak
22 days
So Apriel-H1 does something else: 🧠 Directly distill an existing reasoning-ready Transformer ➡ progressively replace less-critical attention layers with Mamba mixers. ➡ fine-tune only as needed. No training from scratch. No architecture gamble.
1
0
3
@tscholak
Torsten Scholak
22 days
Mamba state-space mixers solve this with: 🔁 constant memory footprint 📈 linear time 🔎 competitive quality when used in the right layers. But full hybrid pretraining is expensive and risky.
1
0
2
@tscholak
Torsten Scholak
22 days
Why do we need hybrid (Transformer + SSM) reasoning models? Because transformers scale quadratically with attention and require growing KV cache at inference. This kills throughput at scale, especially for long reasoning traces, limits test time scaling and makes RL post-training
1
0
5
@tscholak
Torsten Scholak
22 days
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation
5
34
109
@tscholak
Torsten Scholak
23 days
I wrote up my thoughts on the strategic value of efficient attention hybrids last night. Hope it makes for a good morning read.
@tscholak
Torsten Scholak
23 days
A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV
0
1
7
@tscholak
Torsten Scholak
30 days
This is one of the most important research coming out of ServiceNow. Make sure to take a look!
@aarashfeizi
Aarash Feizi
1 month
🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread
0
1
7
@hamishivi
Hamish Ivison
1 month
to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance
@agarwl_
Rishabh Agarwal
1 month
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
6
34
226
@agarwl_
Rishabh Agarwal
1 month
Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to
@alexpiche_
Alexandre L.-Piché
1 month
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
12
60
475
@Ahmed_Masry97
Ahmed Masry
1 month
🚀 Excited to share our EMNLP 2025 Industry Track paper: ColMate 📑 A new framework that introduces a novel late-interaction relevance score, masked OCR language modeling objective, self-supervised contrastive learning, boosting ColPali on multimodal document retrieval! 🧵 👇
1
2
6
@alexpiche_
Alexandre L.-Piché
1 month
In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:
1
29
141
@tscholak
Torsten Scholak
1 month
Shoutout to Philippe Beaudoin from @LawZero_ for asking the right questions!
0
0
0