Arthur Douillard
@Ar_Douillard
Followers
8K
Following
19K
Media
627
Statuses
5K
Distributed Learning @ deepmind | DiLoCo, DiPaCo. Continual Learning PhD @ Sorbonne
London
Joined January 2016
We release today the next step for distributed training: --> Streaming DiLoCo with Overlapping Communication. TL;DR: train data-parallel across the world with low-bandwidth for the same performance: 400x less bits exchanged & huge latency tolerance
19
109
579
What’s the best thing written about why the remarkably vigorous and inventive France of the 70s and 80s (TGV, Minitel, Ariane, Rafale, Concorde, the world’s preeminent nuclear grid…) has not been nearly as visible in the 21st century? What went wrong?
390
239
3K
I've used nano-banana to create that meme. The future is bright for AI slop amateurs like me.
1
2
67
I found a cool tech report on combining DiLoCo with @m_ryabinin's SWARM pipelining with fault tolerance and checked what the author is doing now. I should have guessed: he's at @PrimeIntellect now.
2
4
67
@nearcyan time to launch a new fund https://t.co/QkskC4l4kJ
Excited to announce I'm launching a fund! GCCN is a new fund that exclusively invests in Nvidia with a mandatory 10 year lock-up period We also offer this fund as the benchmark with which all AI venture capital funds should be compared to learn more at https://t.co/jGLpLSTTL6!
0
0
8
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
474
287
7K
Our team at DeepMind is growing (again). 🚀 We're tackling grand challenges in semiconductors, magnets, energy materials, superconductors, and beyond. Join us! Two positions below.
13
41
762
I’ve been promoted to Staff RS. Vain title etc. but feels good to see appreciation for distributed learning in DeepMind ☺️
39
10
450
As a scifi-fi nerd, Starcloud is super exciting: https://t.co/Y7dPYz9ls2 but this applications sounds bullshit to me? Latency isn't going to take hours, and wildfires detection can wait 20s
1
3
16
@metaai Oh, and one point raised by the authors: this super lookahead is only the *outer-optimizer*, and can be perfectly combined with any *inner-optimizer*, such as AdamW or Muon :)
1
0
12
Non-distributed DiLoCo as a super lookahead: Kalluski et al. from @metaai released a study of using Nesterov on outer gradients: https://t.co/cYupiYyQru The algo that they nicknamed SNOO is basically with DiLoCo with M=1, meaning that every K steps, a delta is computed between
5
12
78
Research Log Day 0: DiLoCo Days I decided to a thesis around distributed low-communication training. Essentially, how can we train large models efficiently across distributed nodes and not be utterly destroyed by network latency and bandwidth? (1/n)
1
1
6
True story when scaling to many many nodes
0
0
4
Today, @GoogleResearch announced DeepSomatic, a new machine learning model developed with our partners, including @ucscgenomics and @ChildrensMercy, that accurately identifies genetic variants in cancer cells — a critical step for delivering more precise treatments for patients.
blog.google
An overview of DeepSomatic, a new AI tool that helps identify complex genetic variants in cancer cells.
98
280
2K
Learned today that a startup is using Streaming DiLoCo to train a distributed AlphaFold-like model. Happy :)
2
0
24
Very excited to be able to talk about something I've been working on for a while now - we're working with Commonwealth Fusion Systems, IMO the leading fusion startup in the world, to take our work on AI and tokamaks and make it work at the frontier of fusion energy.
We’re announcing a research collaboration with @CFS_energy, one of the world’s leading nuclear fusion companies. Together, we’re helping speed up the development of clean, safe, limitless fusion power with AI. ⚛️
32
62
1K
Google and Yale scientists have trained an LLM that has generated a novel hypothesis about cancer cellular behavior. This prediction was confirmed multiple times in vitro. - "What made this prediction so exciting was that it was a novel idea. Although CK2 has been implicated in
An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells. With more preclinical and clinical tests,
39
154
2K