Harold Benoit
@harold_matmul
Followers
534
Following
2K
Media
26
Statuses
252
Another day of being a researcher in theory but an engineer in practice | tech staff @LiquidAI_
Joined April 2024
The Monarch release from PyTorch is super neat, big fan of clean mesh abstractions. They're using it in their new RL framework, TorchForge.
0
2
8
Model conversion (like operator grafting or swapping the modeling objective here) is a key tool for architecture research nowadays, given the prohibitive cost of pretraining. Thrilled to see a lab focusing on fast feedback loops partially based on model conversions, excited to
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
0
0
8
It's a good model sir. Very proud of the team, we worked very hard to be on the Pareto frontier of quality and efficiency. Even had the chance to write a CPU-optimized kernel for MoE to squeeze everything from the hardware, and that gave us those sweet throughput results.
Meet LFM2-8B-A1B, our first on-device Mixture-of-Experts (MoE)! π > LFM2-8B-A1B is the best on-device MoE in terms of both quality and speed. > Performance of a 3B-4B model class, with up to 5x faster inference profile on CPUs and GPUs. > Quantized variants fit comfortably on
0
5
45
Found a short and clean walkthrough for how MMA on Tensor Cores works at the PTX level
1
0
4
Note that the argument is very similar to the findings by the π @ZeyuanAllenZhu which motivates its introduction of Canon layers (which are short conv1d), and proves that adding them dramatically improves GLA. (This is in Physics of LLM 4.1)
0
0
4
Went through a lengthy derivation of MLA today to understand exactly how we could absorb the KV decompression matrices, as I didn't find much code on it. The trick is to write the proof on a per-head basis to easily derive the absorption. In the end, it does give an enlightening
1
0
2
Interesting take on the role of Short Conv. The argument uses the view that linear attention variants do a form of online learning (i.e.eTTT) over the KV pairs (k_1,v_1), ... (k_t,v_t), such the update to the state is S_tβ=S_{tβ1}ββΞ·_tβ*β_{Stβ1β} βL(f(S_{tβ1}β;k_tβ),v_tβ),
Why does linear attention need Short Conv? https://t.co/luUybG3RXj
1
12
98
With SoRA, Veo, SeedDance coming out, a critical issue in building video generation models is the lack of automatic and interpretable evaluation metrics or reward models. To power the development of video generation models, we built VideoScore2, the SoTA generative video metric
π₯ Itβs time to bring RL to generative video evaluation! Introducing VideoScore2 β a model that not only generates scores for generative videos but also produces detailed, high-quality reasoning traces. π To build VideoScore2, we curated prompts from 5 sources, covering both
4
17
124
The Dreamer 4 paper is really nice to read. Really appreciate the ablations on how the combinations of learning objectives (shortcut, x-loss, etc.) & architectural tweaks (e.g. hybrid, register tokens) affect speed and quality.
0
0
3
A note on the diffusion learning objective I hadn't realized before and how tweaking it ensures rollout stability (this may be specific to the case where we use diffusion forcing?). In the Dreamer 4 paper, they predict the clean representation x1 instead of the velocity v = (x1
0
0
3
Look at the cool model we just released, it's super fast! :) One innovation is that it uses both cont. & disc. audio tokens to enable gen. w/o losing understanding capabilities.
Today, we expand our LFM2 family to audio. ππ LFM2-Audio is an end-to-end audio-text omni foundation model, and delivers responsive, real-time conversation on-device at just 1.5B parameters. One model. Seamless multimodal support. No chains. > Speech-to-speech >
1
0
8
small but mighty
Introducing Liquid Nanos βοΈ β a new family of extremely tiny task-specific models that deliver GPT-4o-class performance while running directly on phones, laptops, cars, embedded devices, and GPUs with the lowest latency and fastest generation speed. > model size: 350M to 2.6B >
0
3
10
The secret sauce most definitely is in the data, given that the architecure is fairly standard: Qwen3 backbone + NaViT SigLip2 (i.e. it uses packed vision sequences). They use patch_size=16 and pixel_shuffle_scale_factor=2 in order to use few image tokens. A 256x256 image will
1/ Introducing Isaac 0.1 β our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. https://t.co/dJ1Wjh2ARK
2
1
18
Switched the config system for the experiments to pydantic-settings, and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved.
1
0
3
Many people would benefit from learning tensors contraction. Most concepts in ML architecture can be simplified and abstracted through this lens. Things with different names are just contracting, batching, sharding on different dimensions e.g. TP vs CP, BatchNorm vs LayerNorm,
1
0
7