max @maxencelsb X Profile

max

@maxencelsb

Followers

31

Following

420

Media

14

Statuses

78

💻 GitHub: https://t.co/hsxfYxPd8j 🤗 HF: https://t.co/HTLvRfgnoR

Paris, France

Joined September 2024

Don't wanna be here? Send us removal request.

Maxime Labonne

@maximelabonne

1 month

📚 Efficient Language Specialization for Small Language Models @maxencelsb and @SinoueG have released a preprint about their excellent work on fine-tuning small models in French. It shows a solid post-training pipeline to improve French performance while preserving English

5

17

121

Liquid AI

@LiquidAI_

3 months

Meet Luth-LFM2. A French fine-tuned LFM2 instance designed by Maxence Lasbordes and Sinoué GAD, to enhance multilingual capabilities of LFM2! In this model class, Luth-LFM2 sets a new record in French instruction following, GPQA, MMLU and math.

19

31

129

Maxime Labonne

@maximelabonne

3 months

Really impressed by the French finetune of LFM2 made by two students. They created a solid post-training pipeline (FFT + merging) and open-sourced all the code and data. Amazing work by Sinoué Gad and Maxence Lasbordes!

8

25

214

Qwen

@Alibaba_Qwen

3 months

🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready! 🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following. 🔹 Thinking: Advanced reasoning in logic, math, science & code — built for

143

401

3K

Audio and Speech Processing Papers

@AudioAndSpeech

5 months

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices.

arxiv.org

The ability to dynamically adjust the computational load of neural models during inference in a resource aware manner is crucial for on-device processing scenarios, characterised by limited and...

1

3

max

@maxencelsb

5 months

GitHub: https://t.co/lcHrCTB0iz Models: https://t.co/tljsIlS9M3 Dataset: https://t.co/Q692DDrEy9 Feel free to leave a star ⭐!

huggingface.co

0

max

@maxencelsb

5 months

✨ Sharing my most recent side project: LeCarnet, a synthetic dataset of 2M+ French children's stories generated with Mistral Large inspired by TinyStories. Implemented the data generation, training, and eval pipelines. Also trained 3 SLMs on the dataset: LeCarnet-3M/8M/21M.

1

0

3

Raphaël Sourty

@raphaelsrty

5 months

I'm thrilled to announce the release of FastPlaid ! 🚀🚀 FastPlaid is a high-performance engine for multi-vector search, built from the ground up in Rust (with the help of Torch C++)⚡️ You can view FastPlaid as the counterpart of Faiss for multi vectors.

10

40

250

Antoine Chaffin

@antoine_chaffin

7 months

Among all those LLM releases, here is an important retrieval release: To overcome limitations of awesome ModernBERT-based dense models, today @LightOnIO is releasing GTE-ModernColBERT, the very first state-of-the-art late-interaction (multi-vectors) model trained using PyLate🚀

9

59

255

Sam Altman

@sama

8 months

TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://t.co/XKB4XxjREV we are excited to make this a very, very good model! __ we are planning to

openai.com

We’re planning to release our first open language model since GPT‑2 in the coming months. We’re excited to collaborate with developers, researchers, and the broader community to gather inputs and...

1K

13K

max

@maxencelsb

8 months

Read the PPO paper. The idea of the clipping mechanism is so smart

0

max

@maxencelsb

8 months

grok one shots everything its crazy

0

max

@maxencelsb

8 months

Read the TRGPPO paper today, which addresses the exploration issue of PPO under poor policy initialization https://t.co/k8rWLrMY5u

0

max

@maxencelsb

8 months

Read the CutMix paper for uni. Really smart data aug. technique https://t.co/C1MI5UjkNF

0

max

@maxencelsb

8 months

Read the Grad-CAM paper. Uses gradients to generate heatmaps, highlighting important regions in images for CNN decisions. https://t.co/iIVN9ULK2p

0

max

@maxencelsb

8 months

My latest project: FlashAttention-2 in Triton for Sliding Window Attention. - Forward & Backward pass - Sliding Window/Causal/Global Attention - 2-10x TFLOPs/s increase compared to standard PyTorch attention - Tiled matmul optimization - Online softmax https://t.co/XupyAfHzJX