max
@maxencelsb
Followers
31
Following
420
Media
14
Statuses
78
💻 GitHub: https://t.co/hsxfYxPd8j 🤗 HF: https://t.co/HTLvRfgnoR
Paris, France
Joined September 2024
📚 Efficient Language Specialization for Small Language Models @maxencelsb and @SinoueG have released a preprint about their excellent work on fine-tuning small models in French. It shows a solid post-training pipeline to improve French performance while preserving English
5
17
121
Meet Luth-LFM2. A French fine-tuned LFM2 instance designed by Maxence Lasbordes and Sinoué GAD, to enhance multilingual capabilities of LFM2! In this model class, Luth-LFM2 sets a new record in French instruction following, GPQA, MMLU and math.
19
31
129
Really impressed by the French finetune of LFM2 made by two students. They created a solid post-training pipeline (FFT + merging) and open-sourced all the code and data. Amazing work by Sinoué Gad and Maxence Lasbordes!
8
25
214
🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following. 🔹 Thinking: Advanced reasoning in logic, math, science & code — built for
143
401
3K
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices.
arxiv.org
The ability to dynamically adjust the computational load of neural models during inference in a resource aware manner is crucial for on-device processing scenarios, characterised by limited and...
1
1
3
GitHub: https://t.co/lcHrCTB0iz Models: https://t.co/tljsIlS9M3 Dataset: https://t.co/Q692DDrEy9 Feel free to leave a star ⭐!
huggingface.co
0
0
0
✨ Sharing my most recent side project: LeCarnet, a synthetic dataset of 2M+ French children's stories generated with Mistral Large inspired by TinyStories. Implemented the data generation, training, and eval pipelines. Also trained 3 SLMs on the dataset: LeCarnet-3M/8M/21M.
1
0
3
I'm thrilled to announce the release of FastPlaid ! 🚀🚀 FastPlaid is a high-performance engine for multi-vector search, built from the ground up in Rust (with the help of Torch C++)⚡️ You can view FastPlaid as the counterpart of Faiss for multi vectors.
10
40
250
Among all those LLM releases, here is an important retrieval release: To overcome limitations of awesome ModernBERT-based dense models, today @LightOnIO is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate🚀
9
59
255
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://t.co/XKB4XxjREV we are excited to make this a very, very good model! __ we are planning to
openai.com
We’re planning to release our first open language model since GPT‑2 in the coming months. We’re excited to collaborate with developers, researchers, and the broader community to gather inputs and...
1K
1K
13K
Read the TRGPPO paper today, which addresses the exploration issue of PPO under poor policy initialization https://t.co/k8rWLrMY5u
0
0
0
Read the CutMix paper for uni. Really smart data aug. technique https://t.co/C1MI5UjkNF
0
0
0
Read the Grad-CAM paper. Uses gradients to generate heatmaps, highlighting important regions in images for CNN decisions. https://t.co/iIVN9ULK2p
0
0
0
My latest project: FlashAttention-2 in Triton for Sliding Window Attention. - Forward & Backward pass - Sliding Window/Causal/Global Attention - 2-10x TFLOPs/s increase compared to standard PyTorch attention - Tiled matmul optimization - Online softmax https://t.co/XupyAfHzJX
github.com
FlashAttention for sliding window attention in Triton (fwd + bwd pass) - MaxLSB/flash-attn2
0
0
1