DSTRBTD @dstrbtd_ai X Profile

DSTRBTD

@dstrbtd_ai

Followers

794

Following

6

Media

23

Statuses

44

Trust-less Decentralised Distributed Training

https://t.co/l7dtQaG1Mj

Bittensor's Subnet 38

Joined July 2025

Don't wanna be here? Send us removal request.

DSTRBTD

@dstrbtd_ai

9 days

Last Friday, DSTRBTD's SN38 was officially de-registered from Bittensor's main-net. While our subnetwork slot is gone, the breakthroughs and the community remain. From integrating DiLoCo & MuLoCo to decentralised 4b LLM pre-training, we’re proud of what we've achieved in the

3

32

DSTRBTD

@dstrbtd_ai

13 days

2025 was a year of technical milestones for DSTRBTD. We’re proud of how far we’ve come; here are some of our favourite highlights of the year: • May: First team to integrate DeepMind’s DiLoCo strategy into a Bittensor subnet. • August: Became only the 2nd team globally to

4

9

30

DSTRBTD

@dstrbtd_ai

14 days

Mechanism 1 is only 10 days old, but the progress we're seeing there is already very exciting. As miners refine their strategies we're seeing lower losses and communication volumes (more efficient training), and better throughputs (faster training speed). If this trend

1

4

26

DSTRBTD

@dstrbtd_ai

16 days

Starting the year incredibly grateful to our Open Source contributors. Over the holidays, while working on a PR to migrate our Mechanism 0 DataLoader to R2, @jorritvangils spotted a critical bug in our miner code. He quickly merged a fix (PR #87: https://t.co/F72Yid3Dwx) to

github.com

Currently, self.current_block is updated continuously, causing a mismatch between the value passed to DatasetLoader.next_pages: pages = await DatasetLoader.next_pages(offset=self.current_block) an...

7

2

18

DSTRBTD

@dstrbtd_ai

21 days

Last Friday, we launched Mechanism 1 on Subnet 38's main-net! 🚀 Mechanism 1 is a winner-takes-all mechanism that aims to incentivise miners to develop SOTA distributed training strategies (see the "Aggregation" row in the heat-map in the attached post). These optimised

DSTRBTD

@dstrbtd_ai

28 days

Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and

2

1

14

DSTRBTD

@dstrbtd_ai

28 days

If we’ve missed out any other public decentralized pre-training efforts, we’d love for people to share them with us. Especially interested in protocols exploring novel aggregation techniques, compression algorithms or incentive mechanisms.

0

5

DSTRBTD

@dstrbtd_ai

28 days

It's worth noting that there are also excellent teams like Nous Research, Grail and Gensyn working on decentralized post-training. This thread focuses specifically on decentralized pre-training, where the size and type of information being shared are quite different. Both

1

0

6

DSTRBTD

@dstrbtd_ai

28 days

Legend / Terminology: Sharding: • DP = Data Parallelism • PP = Pipeline Parallelism Aggregation: • SparseLoCo = https://t.co/EJvI1pey1r • DiLoCo = https://t.co/cNza3pqdoS • Node0 = https://t.co/2FKMXziZk7 Communication: • Centralised = weights / gradients are shared

1

0

1

DSTRBTD

@dstrbtd_ai

28 days

Decentralized pre-training has accelerated rapidly over the past year, with multiple teams running public experiments each taking a different approach to the same problem. Here is a high-level comparison across sharding strategy, permissions, model scale, aggregation, and

2

3

12

DSTRBTD

@dstrbtd_ai

1 month

A question we often get from members of our community is: "in layman's terms what is DSTRBTD's long term vision?" Put simply, its building community owned artificial intelligence. Right now, the world’s most powerful AI is owned and controlled by a small number of large

0

2

12

DSTRBTD

@dstrbtd_ai

1 month

DSTRBTD’s Run 4 is our most stable attempt to date at training a 4B parameter model in a fully permission-less, trust-less and decentralised setting: https://t.co/0lSdsZfHng. Over the past week, we’ve seen an average of 10 participants per AllReduce (the process of sharing

1

5

17

DSTRBTD

@dstrbtd_ai

1 month

DSTRBTD's Mechanism 1 is now producing reproducible benchmarks for distributed training optimizers. Each optimizer is evaluated in a sandbox environment that trains NanoGPT variants for 10k steps. We record: • Final Loss • Communication Volume • Throughput These metrics are

1

3

14

DSTRBTD

@dstrbtd_ai

2 months

Introducing DSTRBTD’s Subnet Mechanism 1 - DisTrOpZ https://t.co/6V3OkZyhCT Inspired by @exolabs's gym, this mechanism asks miners to submit a competitive Distributed Training Strategy which includes a pairing of a Communication Class and an Optimizer Class. Mechanism 1 will be

github.com

Distributed Training Strategy Optimization Subnet Mechanism - dstrbtd/DisTrOpZ

2

3

17

DSTRBTD

@dstrbtd_ai

3 months

Earlier this week we upgraded to v1.2.2 and began Run 1 on our newly released 4B parameter global model. Within just a few days, we’re seeing strong convergence achieving a global loss of 2.9 in under 10 outer steps. You can follow the training progress live on our performance

4

21

DSTRBTD

@dstrbtd_ai

3 months

Earlier this week we started our first 4B parameter model training run, marking a major scaling milestone for DSTRBTD. This was enabled through v1.2.0, which introduced: • Multi-GPU mining and validation • Migration of gradient/state hosting + tracking from HuggingFace → R2

1

7

27