 
            
              ZD1908
            
            @ZDi____
Followers
                231
              Following
                34K
              Media
                370
              Statuses
                3K
              (mostly) Audio/TTS ML research & LSTM enjoyer; by myself | 馃嚘馃嚪 25M | DMs open
              
              Latent space
            
            
              
              Joined June 2024
            
            
           Well, I can't improve my thing further, so I'm releasing it just to document the process. I tried a making an efficient neural audio codec by combining 16KHz STFT-VQGAN, and a Wave U-Net to correct artifacts and upsample to 44.1KHz. (Substack link in replies) 
          
                
                1
              
              
                
                0
              
              
                
                5
              
             It's funny how the paradigm in seq2seq went from encoder<-cross attention->decoder to just tokenize both input and output sequence together, concatenate them and train a pure self-attention decoder, and it works. Decoder-only transformer is truly something. 
          
                
                0
              
              
                
                0
              
              
                
                0
              
             AI bros be like "(fire emoji) (fire emoji) Hollywood is FINISHED! AI X is the future!" and it's the sloppiest slop in the history of slop. Like, come on. 
          
          
                
                0
              
              
                
                0
              
              
                
                2
              
             Seems to be hipBLASLt shitting the bed on a BF16 matmul. Turning mixed precision off removes the crash. Maybe it's the Tensile backend? Will have to retry with ROCBLAS_USE_HIPBLASLT=1 Thank God for TensorFloat32 tho. 
           This has been preventing me from achieving anything in the last 2 days. It doesn't go away no matter what I try, and is completely random. 
            
                
                0
              
              
                
                0
              
              
                
                1
              
            
            @AnushElangovan TF32 seems stable enough on MI300X, why is it not on by default?
          
          
                
                1
              
              
                
                0
              
              
                
                1
              
             This has been preventing me from achieving anything in the last 2 days. It doesn't go away no matter what I try, and is completely random. 
          
                
                0
              
              
                
                0
              
              
                
                0
              
             I was wondering why my decoder was miserably failing to reduce loss. I just realized I forgot to tell Qwen code to make my transformer pre-norm. 
          
                
                0
              
              
                
                0
              
              
                
                0
              
             Lazy way to make a dataloader efficient: just load the entire dataset into CPU RAM. 
          
                
                0
              
              
                
                0
              
              
                
                1
              
             Este 19 de octubre conmemoramos los 111 a帽os del Paso a la Inmortalidad de Julio Argentino Roca, pr贸cer nacional, dos veces Presidente de la Naci贸n y figura clave en la consolidaci贸n del Estado argentino. Bajo su liderazgo se llev贸 a cabo la Campa帽a del Desierto, hito decisivo 
          
                
                423
              
              
                
                2K
              
              
                
                11K
              
             i cant ever look at graphs like this the same again 
          
          
                
                3
              
              
                
                1
              
              
                
                54
              
             Pretraining both encoder and decoder to build a rich prior for text and audio for later finetuning. It also allows me to take advantage of fixed-length training. Container is rocm/pytorch-training:v25.8, everything works out of the box. 
          
                
                0
              
              
                
                0
              
              
                
                1
              
             Pretraining decoder on unconditional AR modeling of 4B audio tokens and encoder on char-level masked language modeling. 28% MFU on 355M params after fused AdamW, torch.compile, Flash Att 2 on 1x@HotAisle MI300X. Later I'll connect the two on a small amount of paired data for TTS. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             The official PyTorch documentation says TensorFloat32 is not available on ROCm, but this is a lie: it's disabled unless HIPBLASLT_ALLOW_TF32=1, hiding a 2.8x speedup in FP32 matmuls. HIPBLASLT_ALLOW_TF32 should be on by default. 
          
                
                1
              
              
                
                0
              
              
                
                1
              
             
               
               
               
               
             
             
              