dstrbtd_ai Profile Banner
DSTRBTD Profile
DSTRBTD

@dstrbtd_ai

Followers
794
Following
6
Media
23
Statuses
44

Trust-less Decentralised Distributed Training

Bittensor's Subnet 38
Joined July 2025
Don't wanna be here? Send us removal request.
@dstrbtd_ai
DSTRBTD
9 days
Last Friday, DSTRBTD's SN38 was officially de-registered from Bittensor's main-net. While our subnetwork slot is gone, the breakthroughs and the community remain. From integrating DiLoCo & MuLoCo to decentralised 4b LLM pre-training, we’re proud of what we've achieved in the
3
3
32
@dstrbtd_ai
DSTRBTD
13 days
2025 was a year of technical milestones for DSTRBTD. We’re proud of how far we’ve come; here are some of our favourite highlights of the year: • May: First team to integrate DeepMind’s DiLoCo strategy into a Bittensor subnet. • August: Became only the 2nd team globally to
4
9
30
@dstrbtd_ai
DSTRBTD
14 days
Mechanism 1 is only 10 days old, but the progress we're seeing there is already very exciting. As miners refine their strategies we're seeing lower losses and communication volumes (more efficient training), and better throughputs (faster training speed). If this trend
1
4
26
@dstrbtd_ai
DSTRBTD
16 days
Starting the year incredibly grateful to our Open Source contributors. Over the holidays, while working on a PR to migrate our Mechanism 0 DataLoader to R2, @jorritvangils spotted a critical bug in our miner code. He quickly merged a fix (PR #87: https://t.co/F72Yid3Dwx) to
Tweet card summary image
github.com
Currently, self.current_block is updated continuously, causing a mismatch between the value passed to DatasetLoader.next_pages: pages = await DatasetLoader.next_pages(offset=self.current_block) an...
7
2
18
@dstrbtd_ai
DSTRBTD
21 days
Last Friday, we launched Mechanism 1 on Subnet 38's main-net! 🚀 Mechanism 1 is a winner-takes-all mechanism that aims to incentivise miners to develop SOTA distributed training strategies (see the "Aggregation" row in the heat-map in the attached post). These optimised
@dstrbtd_ai
DSTRBTD
28 days
Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and
2
1
14
@dstrbtd_ai
DSTRBTD
28 days
If we’ve missed out any other public decentralized pre-training efforts, we’d love for people to share them with us. Especially interested in protocols exploring novel aggregation techniques, compression algorithms or incentive mechanisms.
0
0
5
@dstrbtd_ai
DSTRBTD
28 days
It's worth noting that there are also excellent teams like Nous Research, Grail and Gensyn working on decentralized post-training. This thread focuses specifically on decentralized pre-training, where the size and type of information being shared are quite different. Both
1
0
6
@dstrbtd_ai
DSTRBTD
28 days
Legend / Terminology: Sharding: • DP = Data Parallelism • PP = Pipeline Parallelism Aggregation: • SparseLoCo = https://t.co/EJvI1pey1r • DiLoCo = https://t.co/cNza3pqdoS • Node0 = https://t.co/2FKMXziZk7 Communication: • Centralised = weights / gradients are shared
1
0
1
@dstrbtd_ai
DSTRBTD
28 days
Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and
2
3
12
@dstrbtd_ai
DSTRBTD
1 month
A question we often get from members of our community is: "in layman's terms what is DSTRBTD's long term vision?" Put simply, its building community owned artificial intelligence. Right now, the world’s most powerful AI is owned and controlled by a small number of large
0
2
12
@dstrbtd_ai
DSTRBTD
1 month
DSTRBTD’s Run 4 is our most stable attempt to date at training a 4B parameter model in a fully permission-less, trust-less and decentralised setting: https://t.co/0lSdsZfHng. Over the past week, we’ve seen an average of 10 participants per AllReduce (the process of sharing
1
5
17
@dstrbtd_ai
DSTRBTD
1 month
DSTRBTD's Mechanism 1 is now producing reproducible benchmarks for distributed training optimizers. Each optimizer is evaluated in a sandbox environment that trains NanoGPT variants for 10k steps. We record: • Final Loss • Communication Volume • Throughput These metrics are
1
3
14
@dstrbtd_ai
DSTRBTD
2 months
Introducing DSTRBTD’s Subnet Mechanism 1 - DisTrOpZ https://t.co/6V3OkZyhCT Inspired by @exolabs's gym, this mechanism asks miners to submit a competitive Distributed Training Strategy which includes a pairing of a Communication Class and an Optimizer Class. Mechanism 1 will be
Tweet card summary image
github.com
Distributed Training Strategy Optimization Subnet Mechanism - dstrbtd/DisTrOpZ
2
3
17
@dstrbtd_ai
DSTRBTD
3 months
Earlier this week we upgraded to v1.2.2 and began Run 1 on our newly released 4B parameter global model. Within just a few days, we’re seeing strong convergence achieving a global loss of 2.9 in under 10 outer steps. You can follow the training progress live on our performance
4
4
21
@dstrbtd_ai
DSTRBTD
3 months
Earlier this week we started our first 4B parameter model training run, marking a major scaling milestone for DSTRBTD. This was enabled through v1.2.0, which introduced: • Multi-GPU mining and validation • Migration of gradient/state hosting + tracking from HuggingFace → R2
1
7
27