Explore tweets tagged as #DataParallel
@SimonLee79475
Simon A. Lee
2 years
nn.DataParallel() will be feeding families
1
0
0
@bighungrypigeon
Vijay Jaisankar
2 years
...and breathe xD The authors trained it with `DataParallel` and we obviously don't have more than 1 good GPU :P So to load the model we needed to change the weights' names and remove "module" But now we have a working PointNet! Next step: the global feat vector gn!! (20/N)
1
0
0
@tomaarsen
tomaarsen
2 years
Sentence Transformers supports DataParallel as well as the superior DistributedDataParallel. The usage is as simple as running your normal training script with: `torchrun --nproc_per_node=4 train_script.py` (if 4 is your number of GPUs) instead of `python train_script.py` 🧵
1
0
2
@ecomputerbooks
Free Computer Books
2 years
Data Parallel C++, 2nd Edition: Programming Accelerated Systems Using C++ and SYCL - https://t.co/4IcxP0tLKP Learn how to accelerate C++ programs using Data Parallelism and SYCL. #DataParallel #cpp #cpplus #CppProgramming #ParallelProgramming #programming #programmer
0
0
0
@ecomputerbooks
Free Computer Books
10 months
(Open Access) Data Parallel C++: Programming Accelerated Systems Using C++ and SYCL - https://t.co/4IcxP0tLKP Look for "Read and Download Links" section to download. Follow me if you like this. #programming #cpp #cplusplus #DataParallel #ParallelProgramming #ConcurrentProgramming
0
0
0
@glouppe
Gilles Louppe
3 years
Started training my baby, hand-made 40M GPT model on Harry Potter books 🤓 GPUs go brrrr! 🚀 (@PyTorch's nn.DataParallel makes multi-GPU support so easy <3)
9
18
225
@WanderingByte
Sonu
1 year
It's been years since kaggle provide 2 T4 gpus, and I was not aware of how to use it. Just wrap the model with nn.DataParallel. #pytorch #kaggle
0
0
0
@dyalogapl
Dyalog
1 year
At #Dyalog24, Brandon Wilson is demonstrating how APL can handle large mathematical databases like Metamath, offering a unique solution for efficient proof verification. But who will ask the questions at the end? 😱 #APL #ProofVerification #DataParallel #Metamath
0
0
4
@_avichawla
Avi Chawla
11 months
9) Use DistributedDataParallel, not DataParallel. 10) Use activation checkpointing in memory constraints [check the visual] 11) Use torch.rand(2, 2, device ...) to create a tensor on GPU. • [.]cuda() creates a tensor on CPU and then transfers it to GPU, which is slow.
1
0
0
@puwaer
puwaer
6 months
とりあえず、8億トークンほどqwen 3 14b に学習させてる。 ずっとTraining lossが下がってるからやっぱりr18の知識は全然ないのかなぁ? あと、h200(140gb)はdataParallelで分散学習できるから効率がいいね
0
0
2
@dzhulgakov
Dmytro Dzhulgakov
2 years
GPUs keep getting faster and keeping even a single one busy from python is often a challenge (hence torch.compile and friends). This made driving many GPUs from a process slow: DataParallel -> DistributedDataParallel switch
2
5
61
@ezyang
Edward Z. Yang
1 month
I've been brainstorming episodes for the next season of PyTorch Developer Podcast. DTensor StridedShard, FSDP-TP order Redistributing a DTensor Prefetching vs Bucketing History of FSDP in PyTorch Multiprocessing: DataParallel versus DistributedDataParallel Monarch Parallelism
20
31
393
@CountableMagic
Modulus Labs
2 years
Our third key insight is generalizing the linear-time prover algorithm from Libra [XZZ+19] to the dataparallel case, allowing 𝑅𝑒𝑚𝑎𝑖𝑛𝑑𝑒𝑟 to achieve a strictly linear proving time for even dataparallel, unstructured layers requiring wiring predicates ♎-🦒
1
1
10
@aarunbhardwaj
🧠 /drarun ⚙️🤖
1 month
🔥 Autograd magic! PyTorch computes gradients: x = torch.tensor([2.0], requires_grad=True) y = x ** 2 y.backward() print(x.grad) #PyTorch 🚀 Multi-GPU training with DataParallel: model = nn.DataParallel(model) https://t.co/mrHbmp7Bx3(device) Scale training! #PyTorch
0
0
0
@andersonbcdefg
Ben (no treats)
1 year
i like my DataParallel like I like my pants: Fully Sharted
0
0
19
@oriki111
もっさん
2 years
Hugging FaceのTrainerが自動でDataParallelとかしてくれて感動してnote記事化した(雑 並列化どうしているのかを確認するために、リポジトリのTrainerクラスの定義とか見てたけど、本当に色々作り込んでいてくれてありがたや〜となった。 https://t.co/i92bfadWLG
0
0
7
@mima_tea
みまティー
3 years
pytorchのdataparallelでGPU:0だけメモリ多めに食うのやりづらいんだけど,なんか対処方法とかあるのかな
0
0
1
@akshay_pachaar
Akshay 🚀
3 months
9) Use DistributedDataParallel, not DataParallel. 10) Use torch.rand(2, 2, device ...) to create a tensor on GPU. A .cuda() call creates a tensor on CPU and then transfers it to GPU, which is slow. 11) Use activation checkpointing in memory constraints👇
1
0
13