Explore tweets tagged as #DataLoader
@AnkitMi14760422
Ankit Mishra
7 days
Built a skin cancer detection pipeline with deep learning 🧠📸.Used EfficientNetB0 for transfer learning, stratified splits, image augmentation, custom Dataset & DataLoader, and device-aware training. Blending vision with structure—medical ML done right. 🩺⚙️ #PyTorch #AI
Tweet media one
Tweet media two
1
0
4
@Hesamation
ℏεsam
10 months
ML GRIND DAY 7.> finished the "Build a LLM from Scratch" book 💪.> LLM classification and instruction fine-tuning.> diving deep into some PyTorch basics. - PyTorch Dataset and DataLoader. - log-sum-exp-trick for CrossEntropyLoss. one book down! .Let's gooo.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
10
21
391
@YouJiacheng
You Jiacheng
8 months
New NanoGPT training speed record: 3.28 FineWeb val loss in 3.95 minutes. Previous record: 4.41 minutes.Changelog:.- @leloykun arch optimization: ~17s.- remove "dead" code: ~1.5s.- re-implement dataloader: ~2.5s.- re-implement Muon: ~1s.- manual block_mask creation: ~5s
Tweet media one
13
24
302
@turtleclub420
🥀ZombieJellyTortise 👑🩸chartReaper
1 year
VAE dataloader building. not bad for only train/eval on 10 images
1
0
0
@ayushthakur0
Ayush Thakur
1 year
If you are confused what `num_workers` to be used with your PyTorch dataloader, it is good idea to just profile it. I just did for my own dataloader and found 24 workers to be most performant while earlier I was using 29 workers.
Tweet media one
0
0
3
@NVIDIAHealth
NVIDIA Healthcare
8 days
🚀 cuik-molmaker swaps Chemprop’s Python/RDKit dataloader for lean C++—the step that used to hog ~50% of every training batch is now a blip. Ongoing work will generalize to other GNNs. Try it out today as a pip installable or at
0
5
22
@wateriscoding
water
1 year
Wait!!! So this DataLoader thing existed to reduce gpu utilisation!!! And make my model train
Tweet media one
6
3
54
@ozeniy
YasunoriOzeni
1 month
改変されたDataLoaderによりSalesforce上のデータが抜き取られる事件. DataLoaderが動かなくなったときに、動作するVerのDLリンクをコミュニティ等で知らない人に聞いたりするのちょっと危ないかも。Salesforceアカウント持ってる人全員がリテラシー高いわけじゃないし、、.
0
2
25
@RedBull_7584
ぶる🦬
1 year
dataloaderから出力される画像batchを可視化するのに地味に便利.
1
6
109
@sumiren_t
sumiren
2 months
とてもよかった。Dataloader、キューに関数呼び出しとその引数をつめたうえでdebounceして適当なボリューム感で投げて、返ってきたもので満たせるpromiseをresolveするということか。面白い。手間かからなさそうだから、RSCやるならゼロイチの開発でも入れたら良さそう.
1
5
33
@iScienceLuvr
Tanishq Abraham is at ICML
3 months
Scalable and Performant Data Loading. "We present SPDL (Scalable and Performant Data Loading), an open-source, framework-agnostic library designed for efficiently loading array data to GPU.". "Our benchmark shows that compared to the PyTorch DataLoader, SPDL can iterate through
Tweet media one
2
24
146
@RhysSullivan
Rhys
9 months
One of my favorite patterns that doesn't get enough love is dataloader. It allows you to fetch data like normal, but it automatically batches & caches the calls. This allows you to fetch data from completely different locations in your app with the speed of one call
Tweet media one
5
3
28
@vishal_learner
vishal
2 months
Visualizing how labels (bottom) correspond to the "response"s in the input_ids (top) for a QA dataset finetuning dataloader
Tweet media one
0
0
1
@_avichawla
Avi Chawla
3 days
15-16) Set max_workers and pin_memory in DataLoader. PyTorch dataloader has two terrible default settings. Update them according to your config. Speedup is shown in the image below 👇
Tweet media one
1
0
24
@lusxvr
Luis
3 months
TIL: If you write a custom collate_fn, don’t forget to set pin_memory=True in your PyTorch DataLoader. It can halve your training time and actually make the GPU go brr (violet = before, yellow = after)
Tweet media one
0
0
6
@pratyushmaini
Pratyush Maini
2 months
This is some seriously high-quality analysis on which open dataloader to use for your multimodal workflows. It's definitely going to save you months of time!.
@HaoliYin
Haoli Yin
2 months
Ever wondered how large-scale multimodal training worked?.How ~petabytes of data are loaded from the cloud to accelerators?. Here I benchmark 4 frameworks (WebDataset, Energon, MDS, LitData) on data prep efficiency, cloud streaming perf, & fault tolerance. TL;DR: Try out LitData
Tweet media one
0
1
18
@al3rez
Alireza Bashiri
2 months
AI/ML rants:. Python isn’t a great language for machine learning. It doesn’t support real parallel threading, so data loaders end up slow and awkward. There’s also this PyTorch DataLoader problem if you don’t set things up exactly right, it looks like there’s a memory leak. I
Tweet media one
2
0
4
@ysknsid25
Kanon
2 months
見てる👀 #kanon_feed_tech . "graphql/dataloader" .
0
0
0
@jramapuram
Jason Ramapuram
3 months
Stop by poster #596 at 10A-1230P tomorrow (Fri 25 April) at #ICLR2025 to hear more about Sigmoid Attention! . We just pushed 8 trajectory checkpoints each for two 7B LLMs for Sigmoid Attention and a 1:1 Softmax Attention (trained with a deterministic dataloader for 1T tokens):. -
Tweet media one
@jramapuram
Jason Ramapuram
6 months
Small update on SigmoidAttn (arXiV incoming). - 1B and 7B LLM results added and stabilized. - Hybrid Norm [on embed dim, not seq dim], `x + norm(sigmoid(QK^T / sqrt(d_{qk}))V)`, stablizes longer sequence (n=4096) and larger models (7B). H-norm used with Grok-1 for example.
Tweet media one
1
14
45