pieterdelobelle Profile Banner
Pieter Delobelle Profile
Pieter Delobelle

@pieterdelobelle

Followers
502
Following
528
Media
42
Statuses
142

Postdoctoral AI researcher on LLM pretraining, tokenization & safety - Prev: @apple, @aleph__alpha, @milaNLProc, PhD & postdoc @KU_Leuven

Berlin
Joined April 2012
Don't wanna be here? Send us removal request.
@pieterdelobelle
Pieter Delobelle
3 months
Over the last weeks I worked on synthetic datasets, so I made a small LLM scheduler to process large batches reliably, called LLMQ. It's a simple CLI tool that submits jobs (from jsonl or a HF dataset) to RabbitMQ, where multiple workers can take jobs from their queue.
1
1
10
@pieterdelobelle
Pieter Delobelle
6 days
More info (+blogpost) soon! preprint:
0
0
0
@pieterdelobelle
Pieter Delobelle
6 days
πŸ₯³ Our paper "ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias" is accepted at #AAAI2026! ProbLog4Fairness lets you write down how you think bias enters your dataset using probabilistic logic, then automatically corrects for it during neural network
1
2
2
@lagom_nlp
LAGoM NLP
23 days
When is a language hard to model? Previous research has suggested that morphological complexity both does and does not play a role, but it does so by relating the performance of language models to corpus statistics of words or subword tokens in isolation.
1
4
0
@pieterdelobelle
Pieter Delobelle
14 days
We start with the MLP layer, as that provides the biggest VRAM wins and also parallelize the attention heads for a good speedup. Full blog post: https://t.co/Tfgwi3bMiM
0
0
1
@pieterdelobelle
Pieter Delobelle
14 days
I wrote a new blogpost on implementing tensor parallelism for my "llm inference from scratch" series. Now our inference engine can finally serve models that don't fit into one GPU.
@pieterdelobelle
Pieter Delobelle
28 days
I built an LLM inference engine from scratch to learn what goes into serving models efficiently. Starting from @karpathy's nanoGPT with a simple generate() function, I added KV caching, fused sampling (from Flashinfer), CUDA graphs, etc... Let me share some insights. (1/4 🧡)
1
0
6
@Dorialexander
Alexander Doria
15 days
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
80
151
1K
@pieterdelobelle
Pieter Delobelle
28 days
The gap between training models and serving them efficiently is often underestimated. Most researchers don't realize what goes into production inference. Code and detailed analysis available here: βš™οΈ https://t.co/FZEU86AyxF πŸ§‘β€πŸ’» https://t.co/WhYZXHUsid More optimizations
Tweet card summary image
pieter.ai
LLM inference from scratch
0
0
0
@pieterdelobelle
Pieter Delobelle
28 days
But KV caching alone isn't enough. Sampling was taking 1.77ms per step, since we were sorting the entire vocabulary. Fused rejection sampling kernels eliminated this bottleneck. I also captured the CUDA graphs for prefill and decode phases, which helps a lot for smaller batches.
1
0
0
@pieterdelobelle
Pieter Delobelle
28 days
The most important performance gain is KV caching. Before: recompute attention for every token (compute-bound). After: cache keys/values (memory-bound). This change enables batch parallelism and is foundational for serving LLMs profitably.
1
0
0
@pieterdelobelle
Pieter Delobelle
28 days
I built an LLM inference engine from scratch to learn what goes into serving models efficiently. Starting from @karpathy's nanoGPT with a simple generate() function, I added KV caching, fused sampling (from Flashinfer), CUDA graphs, etc... Let me share some insights. (1/4 🧡)
2
4
16
@pieterdelobelle
Pieter Delobelle
2 months
This is gonna be a fun course to teach πŸ”₯
1
0
5
@pieterdelobelle
Pieter Delobelle
2 months
I also added support for YAML to configure the pipelines, so processing an entire HF dataset is as easy as: $ llmq submit -p example-pipeline.yaml epfl-llm/guidelines Here, each sample will get translated and formatted into markdown. https://t.co/YqsLr3fRzQ
Tweet card summary image
github.com
A Scheduler for Batched LLM Inference. Contribute to iPieter/llmq development by creating an account on GitHub.
0
0
0
@pieterdelobelle
Pieter Delobelle
2 months
Just merged pipeline support into LLMQ, my distributed LLM inference scheduler. You can now define multi-stage workflows where each stage can use different models/workers. Results go through queues with independent scaling per stage.
1
0
2
@pieterdelobelle
Pieter Delobelle
3 months
Finally got around to updating my AI conference deadline tracker. I added some new deadlines and a globe view
2
5
19
@pabloiyu
Pablo Iyu Guerrero
3 months
First high-performance inference for hierarchical byte models. @LukasBluebaum and I developed batched inference for tokenizer-free HAT (Hierarchical Autoregressive Transformers) models, developed by @Aleph__Alpha Research. In some settings, we outcompete the baseline Llama.🧡
2
7
28
@pieterdelobelle
Pieter Delobelle
3 months
Serving an LLM efficiently (=profitably) is highly non-trivial and involves a lot of different design choices. Mixture of experts, as used by Deepseek, complicates this a lot. I really learned to appreciate this from @tugot17 while I was at @Aleph__Alpha, so check out this deep
@tugot17
Piotr Mazurek @ NeurIPS πŸ‡ΊπŸ‡Έ
3 months
What are the profit margins of serving DeepSeek 🐳? @schreiberic and I discuss large-scale MoE inference in depth. Blog post link below
0
1
9
@pieterdelobelle
Pieter Delobelle
3 months
@thomas_wint Thanks to EuroEval for the evaluations. Their dataset is here:
0
0
2
@pieterdelobelle
Pieter Delobelle
3 months
5 year old BERT-style models are still winning for Dutch. I was looking at the EuroEval benchmarks and to my surprise are the models we trained in 2019 (RobBERT w/ @thomas_wint ) still SOTA. It takes 70x larger generative models (24B+) to match our 355M parameter encoder model.
2
1
13
@pieterdelobelle
Pieter Delobelle
3 months
I also release some synthetic datasets I made with LLMQ by translating fineweb to Dutch and German. And with a permissive license (ODC-by). πŸ‡©πŸ‡ͺ 500k rows translated with @Unbabel's Tower+ 72B: https://t.co/rbpIv3aDME πŸ‡³πŸ‡± 1.5M rows translated with Tower+ 9B
Tweet card summary image
huggingface.co
@pieterdelobelle
Pieter Delobelle
3 months
Over the last weeks I worked on synthetic datasets, so I made a small LLM scheduler to process large batches reliably, called LLMQ. It's a simple CLI tool that submits jobs (from jsonl or a HF dataset) to RabbitMQ, where multiple workers can take jobs from their queue.
0
0
7