Torsten Scholak @tscholak X Profile

Torsten Scholak

@tscholak

Followers

2K

Following

52K

Media

177

Statuses

4K

Lead Research Scientist, Foundation Models Lab @ServiceNowRSRCH. Opinions are not that of my employer.

https://t.co/eaxA9VpVwh

Montréal

Joined February 2010

Don't wanna be here? Send us removal request.

Torsten Scholak

@tscholak

23 days

A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV

tscholak.github.io

If your models think slowly, your roadmap does too. This essay argues that efficient attention is the hidden control knob for margins, RL capacity, and agent reliability, and shows how Apriel-H1...

0

1

3

Artificial Analysis

@ArtificialAnlys

3 days

Congratulations to the team at @ServiceNowRSRCH - @SathwikTejaswi, @sagardavasam, @tscholak Further analysis on Artificial Analysis: https://t.co/BBCVpxlu4B HuggingFace 🤗 repo: https://t.co/fG8aToZktf ServiceNow’s blog post: https://t.co/Fq767Bs6DC

0

2

13

ServiceNow AI Research

@ServiceNowRSRCH

3 days

1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: https://t.co/GE22SOIBfT 📄Blog:

9

54

210

Nathan Lambert

@natolambert

19 days

With this latest artifacts log roundup of the best open models, I included the list of serious open model builders in the U.S. These 13 are making models way smaller than Chinese competition, and with often worse licenses. We'll be improving this for an update to the ATOM

Interconnects

@interconnectsai

19 days

Latest open artifacts (#16): Who's building models in the U.S., China's model release playbook, and a resurgence of truly open models A month with SOTA releases with (truly) open model releases left and right. https://t.co/lVhmIZBZGT

14

35

230

Junxiong Wang

@_junxiong_wang

21 days

It’s great to see that the MambaInLlama approach has been to scale up. check this blogpost:

huggingface.co

Torsten Scholak

@tscholak

22 days

🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation

0

1

Torsten Scholak

@tscholak

22 days

🙏 Huge thanks to the team behind Apriel-H1 at ServiceNow AI SLAM Lab: @ostap__alex, Luke Kumar, @Ray97369304, Denis Kocetkov, and @jlamypoirier with contributions from Shruthan Radhakrishna, @sohampar, and Shambhavi Mishra More efficient reasoning models coming soon. Stay

0

1

4

Torsten Scholak

@tscholak

22 days

💼 Why enterprises care: ⚡ More concurrent users 🕓 Lower latency 🧠 Higher test-time compute budgets 🤖 More affordable RL finetuning ♻️ Reduced compute footprint

1

0

2

Torsten Scholak

@tscholak

22 days

The result? 🚦 Smooth performance vs throughput trade-off 📈 Efficient variants from H1-25 to H1-40 🥇 Best variant hits 3.4× throughput 🏆 H1-30-SFT delivers 2× throughput with negligible performance drop.

1

0

2

Torsten Scholak

@tscholak

22 days

We introduce two techniques to decide which attention layers to replace: 🔍 LOO (Leave-One-Out) importance 📉 MMR (MIL-Mamba-Replacement) loss ranking These ensure we only replace low-impact reasoning layers.

1

0

3

Torsten Scholak

@tscholak

22 days

So Apriel-H1 does something else: 🧠 Directly distill an existing reasoning-ready Transformer ➡ progressively replace less-critical attention layers with Mamba mixers. ➡ fine-tune only as needed. No training from scratch. No architecture gamble.

1

0

3

Torsten Scholak

@tscholak

22 days

Mamba state-space mixers solve this with: 🔁 constant memory footprint 📈 linear time 🔎 competitive quality when used in the right layers. But full hybrid pretraining is expensive and risky.

1

0

2

Torsten Scholak

@tscholak

22 days

Why do we need hybrid (Transformer + SSM) reasoning models? Because transformers scale quadratically with attention and require growing KV cache at inference. This kills throughput at scale, especially for long reasoning traces, limits test time scaling and makes RL post-training

1

0

5

Torsten Scholak

@tscholak

22 days

🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation

5

34

109

Torsten Scholak

@tscholak

23 days

I wrote up my thoughts on the strategic value of efficient attention hybrids last night. Hope it makes for a good morning read.

Torsten Scholak

@tscholak

23 days

A year ago: make models reason well Now: make reasoners fast enough to deploy Architecture determines which capabilities you can actually ship https://t.co/5TWIIs3jQV

0

1

7

Torsten Scholak

@tscholak

30 days

This is one of the most important research coming out of ServiceNow. Make sure to take a look!

Aarash Feizi

@aarashfeizi

1 month

🚀 Announcing GroundCUA, a high-quality dataset for grounding computer-use agents. With over 3M expert annotations spanning 87 desktop apps, we use our new dataset to train state-of-the-art grounding models, namely GroundNext-3B and GroundNext-7B. 👇 Thread

0

1

7

Hamish Ivison

@hamishivi

1 month

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance

Rishabh Agarwal

@agarwl_

1 month

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to

6

34

226

Rishabh Agarwal

@agarwl_

1 month

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to

Alexandre L.-Piché

@alexpiche_

1 month

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:

12

60

475

Ahmed Masry

@Ahmed_Masry97

1 month

🚀 Excited to share our EMNLP 2025 Industry Track paper: ColMate 📑 A new framework that introduces a novel late-interaction relevance score, masked OCR language modeling objective, self-supervised contrastive learning, boosting ColPali on multimodal document retrieval! 🧵 👇

1

2

6

Alexandre L.-Piché

@alexpiche_

1 month

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave:

1

29

141

Torsten Scholak

@tscholak

1 month

Shoutout to Philippe Beaudoin from @LawZero_ for asking the right questions!

0