ZD1908
@ZDi____
Followers
231
Following
34K
Media
370
Statuses
3K
(mostly) Audio/TTS ML research & LSTM enjoyer; by myself | 馃嚘馃嚪 25M | DMs open
Latent space
Joined June 2024
Well, I can't improve my thing further, so I'm releasing it just to document the process. I tried a making an efficient neural audio codec by combining 16KHz STFT-VQGAN, and a Wave U-Net to correct artifacts and upsample to 44.1KHz. (Substack link in replies)
1
0
5
It's funny how the paradigm in seq2seq went from encoder<-cross attention->decoder to just tokenize both input and output sequence together, concatenate them and train a pure self-attention decoder, and it works. Decoder-only transformer is truly something.
0
0
0
AI bros be like "(fire emoji) (fire emoji) Hollywood is FINISHED! AI X is the future!" and it's the sloppiest slop in the history of slop. Like, come on.
0
0
2
Seems to be hipBLASLt shitting the bed on a BF16 matmul. Turning mixed precision off removes the crash. Maybe it's the Tensile backend? Will have to retry with ROCBLAS_USE_HIPBLASLT=1 Thank God for TensorFloat32 tho.
This has been preventing me from achieving anything in the last 2 days. It doesn't go away no matter what I try, and is completely random.
0
0
1
@AnushElangovan TF32 seems stable enough on MI300X, why is it not on by default?
1
0
1
This has been preventing me from achieving anything in the last 2 days. It doesn't go away no matter what I try, and is completely random.
0
0
0
I was wondering why my decoder was miserably failing to reduce loss. I just realized I forgot to tell Qwen code to make my transformer pre-norm.
0
0
0
Lazy way to make a dataloader efficient: just load the entire dataset into CPU RAM.
0
0
1
Este 19 de octubre conmemoramos los 111 a帽os del Paso a la Inmortalidad de Julio Argentino Roca, pr贸cer nacional, dos veces Presidente de la Naci贸n y figura clave en la consolidaci贸n del Estado argentino. Bajo su liderazgo se llev贸 a cabo la Campa帽a del Desierto, hito decisivo
423
2K
11K
i cant ever look at graphs like this the same again
3
1
54
Pretraining both encoder and decoder to build a rich prior for text and audio for later finetuning. It also allows me to take advantage of fixed-length training. Container is rocm/pytorch-training:v25.8, everything works out of the box.
0
0
1
Pretraining decoder on unconditional AR modeling of 4B audio tokens and encoder on char-level masked language modeling. 28% MFU on 355M params after fused AdamW, torch.compile, Flash Att 2 on 1x@HotAisle MI300X. Later I'll connect the two on a small amount of paired data for TTS.
1
0
1
The official PyTorch documentation says TensorFloat32 is not available on ROCm, but this is a lie: it's disabled unless HIPBLASLT_ALLOW_TF32=1, hiding a 2.8x speedup in FP32 matmuls. HIPBLASLT_ALLOW_TF32 should be on by default.
1
0
1