Explore tweets tagged as #DataParallel
...and breathe xD The authors trained it with `DataParallel` and we obviously don't have more than 1 good GPU :P So to load the model we needed to change the weights' names and remove "module" But now we have a working PointNet! Next step: the global feat vector gn!! (20/N)
1
0
0
Sentence Transformers supports DataParallel as well as the superior DistributedDataParallel. The usage is as simple as running your normal training script with: `torchrun --nproc_per_node=4 train_script.py` (if 4 is your number of GPUs) instead of `python train_script.py` 🧵
1
0
2
Data Parallel C++, 2nd Edition: Programming Accelerated Systems Using C++ and SYCL - https://t.co/4IcxP0tLKP Learn how to accelerate C++ programs using Data Parallelism and SYCL. #DataParallel #cpp #cpplus #CppProgramming #ParallelProgramming #programming #programmer
0
0
0
(Open Access) Data Parallel C++: Programming Accelerated Systems Using C++ and SYCL - https://t.co/4IcxP0tLKP Look for "Read and Download Links" section to download. Follow me if you like this. #programming #cpp #cplusplus #DataParallel #ParallelProgramming #ConcurrentProgramming
0
0
0
At #Dyalog24, Brandon Wilson is demonstrating how APL can handle large mathematical databases like Metamath, offering a unique solution for efficient proof verification. But who will ask the questions at the end? 😱 #APL #ProofVerification #DataParallel #Metamath
0
0
4
9) Use DistributedDataParallel, not DataParallel. 10) Use activation checkpointing in memory constraints [check the visual] 11) Use torch.rand(2, 2, device ...) to create a tensor on GPU. • [.]cuda() creates a tensor on CPU and then transfers it to GPU, which is slow.
1
0
0
とりあえず、8億トークンほどqwen 3 14b に学習させてる。 ずっとTraining lossが下がってるからやっぱりr18の知識は全然ないのかなぁ? あと、h200(140gb)はdataParallelで分散学習できるから効率がいいね
0
0
2
GPUs keep getting faster and keeping even a single one busy from python is often a challenge (hence torch.compile and friends). This made driving many GPUs from a process slow: DataParallel -> DistributedDataParallel switch
2
5
61
I've been brainstorming episodes for the next season of PyTorch Developer Podcast. DTensor StridedShard, FSDP-TP order Redistributing a DTensor Prefetching vs Bucketing History of FSDP in PyTorch Multiprocessing: DataParallel versus DistributedDataParallel Monarch Parallelism
20
31
393
Our third key insight is generalizing the linear-time prover algorithm from Libra [XZZ+19] to the dataparallel case, allowing 𝑅𝑒𝑚𝑎𝑖𝑛𝑑𝑒𝑟 to achieve a strictly linear proving time for even dataparallel, unstructured layers requiring wiring predicates ♎-🦒
1
1
10
🔥 Autograd magic! PyTorch computes gradients: x = torch.tensor([2.0], requires_grad=True) y = x ** 2 y.backward() print(x.grad) #PyTorch 🚀 Multi-GPU training with DataParallel: model = nn.DataParallel(model) https://t.co/mrHbmp7Bx3(device) Scale training! #PyTorch
0
0
0
i like my DataParallel like I like my pants: Fully Sharted
0
0
19
Hugging FaceのTrainerが自動でDataParallelとかしてくれて感動してnote記事化した(雑 並列化どうしているのかを確認するために、リポジトリのTrainerクラスの定義とか見てたけど、本当に色々作り込んでいてくれてありがたや〜となった。 https://t.co/i92bfadWLG
0
0
7
9) Use DistributedDataParallel, not DataParallel. 10) Use torch.rand(2, 2, device ...) to create a tensor on GPU. A .cuda() call creates a tensor on CPU and then transfers it to GPU, which is slow. 11) Use activation checkpointing in memory constraints👇
1
0
13