DSTRBTD
@dstrbtd_ai
Followers
794
Following
6
Media
23
Statuses
44
Trust-less Decentralised Distributed Training
Bittensor's Subnet 38
Joined July 2025
Last Friday, DSTRBTD's SN38 was officially de-registered from Bittensor's main-net. While our subnetwork slot is gone, the breakthroughs and the community remain. From integrating DiLoCo & MuLoCo to decentralised 4b LLM pre-training, we’re proud of what we've achieved in the
3
3
32
2025 was a year of technical milestones for DSTRBTD. We’re proud of how far we’ve come; here are some of our favourite highlights of the year: • May: First team to integrate DeepMind’s DiLoCo strategy into a Bittensor subnet. • August: Became only the 2nd team globally to
4
9
30
Mechanism 1 is only 10 days old, but the progress we're seeing there is already very exciting. As miners refine their strategies we're seeing lower losses and communication volumes (more efficient training), and better throughputs (faster training speed). If this trend
1
4
26
Starting the year incredibly grateful to our Open Source contributors. Over the holidays, while working on a PR to migrate our Mechanism 0 DataLoader to R2, @jorritvangils spotted a critical bug in our miner code. He quickly merged a fix (PR #87: https://t.co/F72Yid3Dwx) to
github.com
Currently, self.current_block is updated continuously, causing a mismatch between the value passed to DatasetLoader.next_pages: pages = await DatasetLoader.next_pages(offset=self.current_block) an...
7
2
18
Last Friday, we launched Mechanism 1 on Subnet 38's main-net! 🚀 Mechanism 1 is a winner-takes-all mechanism that aims to incentivise miners to develop SOTA distributed training strategies (see the "Aggregation" row in the heat-map in the attached post). These optimised
Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and
2
1
14
If we’ve missed out any other public decentralized pre-training efforts, we’d love for people to share them with us. Especially interested in protocols exploring novel aggregation techniques, compression algorithms or incentive mechanisms.
0
0
5
It's worth noting that there are also excellent teams like Nous Research, Grail and Gensyn working on decentralized post-training. This thread focuses specifically on decentralized pre-training, where the size and type of information being shared are quite different. Both
1
0
6
Legend / Terminology: Sharding: • DP = Data Parallelism • PP = Pipeline Parallelism Aggregation: • SparseLoCo = https://t.co/EJvI1pey1r • DiLoCo = https://t.co/cNza3pqdoS • Node0 = https://t.co/2FKMXziZk7 Communication: • Centralised = weights / gradients are shared
1
0
1
Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and
2
3
12
A question we often get from members of our community is: "in layman's terms what is DSTRBTD's long term vision?" Put simply, its building community owned artificial intelligence. Right now, the world’s most powerful AI is owned and controlled by a small number of large
0
2
12
DSTRBTD’s Run 4 is our most stable attempt to date at training a 4B parameter model in a fully permission-less, trust-less and decentralised setting: https://t.co/0lSdsZfHng. Over the past week, we’ve seen an average of 10 participants per AllReduce (the process of sharing
1
5
17
DSTRBTD's Mechanism 1 is now producing reproducible benchmarks for distributed training optimizers. Each optimizer is evaluated in a sandbox environment that trains NanoGPT variants for 10k steps. We record: • Final Loss • Communication Volume • Throughput These metrics are
1
3
14
Introducing DSTRBTD’s Subnet Mechanism 1 - DisTrOpZ https://t.co/6V3OkZyhCT Inspired by @exolabs's gym, this mechanism asks miners to submit a competitive Distributed Training Strategy which includes a pairing of a Communication Class and an Optimizer Class. Mechanism 1 will be
github.com
Distributed Training Strategy Optimization Subnet Mechanism - dstrbtd/DisTrOpZ
2
3
17
Earlier this week we upgraded to v1.2.2 and began Run 1 on our newly released 4B parameter global model. Within just a few days, we’re seeing strong convergence achieving a global loss of 2.9 in under 10 outer steps. You can follow the training progress live on our performance
4
4
21
Earlier this week we started our first 4B parameter model training run, marking a major scaling milestone for DSTRBTD. This was enabled through v1.2.0, which introduced: • Multi-GPU mining and validation • Migration of gradient/state hosting + tracking from HuggingFace → R2
1
7
27