πΊπ¦ Dzmitry Bahdanau
@DBahdanau
Followers
11K
Following
528
Media
25
Statuses
551
Team member at @periodiclabs. Adjunct Prof @ McGill. Member of Mila, Quebec AI Institute. Stream of consciousness is my own.
Joined August 2017
I'm a dumb stochastic parrot, but PLEEASE give me the reward, I will generate slop for any possible and impossible contingency to be rewarded, please give me that sweet reward juice, please please
4
4
148
I don't think that will happen. I increasingly feel that the excellence of small models is benchmaxxxing and making everything training set. OOD generalization seems to get way better with size. The 10T question is if this saturates.
In a year or so this level of performance should be available on a 32b dense model (K2 is 32b active) at a cost of < $0.2/million tokens I don't think folk have that in their estimates
3
4
75
In a year or so this level of performance should be available on a 32b dense model (K2 is 32b active) at a cost of < $0.2/million tokens I don't think folk have that in their estimates
π Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. πΉ SOTA on HLE (44.9%) and BrowseComp (60.2%) πΉ Executes up to 200 β 300 sequential tool calls without human interference πΉ Excels in reasoning, agentic search, and coding πΉ 256K context window Built
18
23
343
i've been waiting for this moment since our initial PipelineRL blog post in May :) πΊπΊπΊ
to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance
2
7
100
can someone explain to me int4 training by @Kimi_Moonshot ? does it mean weights are stored in int4 and dequantized on the fly for futher fp8/bf16 computation? or does this mean actual calculations are in int4, including accumulators?
18
5
166
naive Q: why is DeepGEMM compilation soo slow? how is it different from all other kernels out there?
2
1
8
Marin project is insanely cool! and they get stuff done
β΅Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and itβs even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:
1
2
42
if you were laid off from FAIR, and you want to learn more about @periodiclabs , my DMs are open
0
11
76
For slurm with pyxis gpt-5 hallucinates like crazy. With search. No in-domain training data, no super intelligence.
4
0
31
and the code is still here:
github.com
A scalable asynchronous reinforcement learning implementation with in-flight weight updates. - ServiceNow/PipelineRL
1
0
6
And special greetings to NeurIPS AC who rejected our paper after 4, 5, 5, 5 reviews (6 is the maximum). Always easy to ask for more experiments when someone else pays $$$ for 16 node runs. Hope to join the dark knowledge paper in the elite club of most impactful rejects!
1
0
11
We did lots of good work since PipelineRL release in May: βοΈ higher throughput, seq parallel training, multimodal, agentic RL π white paper with great explanations and results: https://t.co/F3YsIbNRUy We'll present today at CoLM EXPO, room 524C, 1pm!
2
9
61
They should make @pydantic a part of core Python with a bit of syntax sugar to make using it less verbose. Pydantic is a major improvement of Python as a programming language.
5
3
96
@vllm_project (caution: if you use DP+EP layout, you may need a bit more than what ^^ says)
0
0
6
Shoutout to @vllm_project team for developing a clean, infinitely hackable inference engine! vLLM V1 architecture is great for in-flight weight updates. Just don't stop vLLM, send POST /collective_rpc(method_name=update_weights), and watch inference continue with current KVs!
π The RL community keeps pushing boundaries β from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild β but thatβs exactly what
3
11
199
Building an AI Physicist: ChatGPT Co-Creatorβs Next Venture Scaling laws took us from GPT-1 to GPT-5 Pro. But in order to crack physics, we need a new approach. We sat down with Liam Fedus (co-creator of ChatGPT) and Ekin Dogus Cubuk (ex-materials science and chemistry lead at
Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is
11
43
257
I believe in two policies: - competitive free markets for wealth generation - high wealth tax for wealth redistribution The country of my dream would have both, but it doesn't exist yet.
2
3
23
engineering is not about coding engineering is about understanding if you have a code monkey LLM helping you code, but you don't understand your duet will fail miserably it is a matter of time
6
7
91
I'm (mostly) not going to COLM but COLM is going to me in Montreal if you think we should chat, plz DM :)
2
1
29