miru
@miru_why
Followers
2K
Following
3K
Media
232
Statuses
646
3e-4x engineer, unswizzled wagmi. specialization is for warps
Joined January 2024
Scalable GANs with Transformers https://t.co/tUfsKKsVlK
https://t.co/N2KzfMc4aL authors train latent-space transformer GANs up to XL/2 scale, and report SotA 1-step class-conditional image generation results on ImageNet-256 after 40 epochs (*with REPA in discriminator)
3
34
215
Weighted Conditional Flow Matching the authors improve flow-matching training by downweighting poorly-matched noise/image pairs, and show that this cheap reweighting produces clean, straight flow paths (like what you'd get from optimal transport) https://t.co/ykj6k1fh4A
1
1
25
anyone know a paper where they do RLVR strictly on next token prediction task? like ilya's detective novel analogy, doing reasoning rollouts and rewarding the reasoning chains that correctly deduce the criminal's name?
2
0
10
english translation of huawei whistleblower's pangu writeup, 2/2. all translation thanks to gemini, with minor edits and [annotations] from discussion, hopefully we did it justice
1
0
23
anonymous whistleblower from noah's ark lab has posted a writeup detailing the sad saga of pangu - they support honestagi's claim that pangu MoE 72B was plagiarized from qwen 2.5 14B - they also witnessed similar plagiarism from qwen 1.5 110B, deepseek-v3 https://t.co/vE5bFjNfS6
9
20
219
one day in and @giffmana is already fixing annoyances in pytorch main
@giffmana @__kolesnikov__ @XiaohuaZhai Rumor has it you hated pytorch so much you joined meta to fix it from the source yourself, LOL
5
1
145
interesting paper on ‘any-subset’ auto-regressive modeling without the standard product-rule factorization https://t.co/XIkgVS1scu
https://t.co/c6nS16t5ci their model can sample from the true joint distribution with 10% fewer NFEs (i.e. speculative decoding with no extra model)
arxiv.org
In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in...
0
0
9
Boosting Generative Image Modeling via Joint Image-Feature Synthesis https://t.co/UCPZmp5KDC the authors concatenate normal VAE latents with PCA’d DINOv2 embeddings, and find that diffusion models trained on this joint distribution achieve lower FID than VAE-only or VAE+REPA
4
23
169
if you were curious about the torch.sum bug discussed in the gpt-4.5 pretraining podcast ( https://t.co/lroQtO20sD), here’s the original thread from last june
Crazy long standing data race in PyTorch's reductions 😱 https://t.co/goQzBU2rhs… Credit to the OpenAI team for figuring out (I hear it was weeks of debugging)
1
1
20
PixelFlow: Pixel-Space Generative Models with Flow https://t.co/WWLoOYC7n8
https://t.co/zvBhzaTBBL the authors train a pixel space image generator with gradually-increasing spatial resolution across timesteps, and release 1B-scale class- and text-conditional checkpoints
0
22
80
sakana is now working on a more comprehensive effort to fix all eval script exploits/loopholes discovered by the AI CUDA Engineer and reevaluate their technique. happy to see it and hope they succeed https://t.co/Q4LX6nbdBW
Update: Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system
0
1
43
sakana have updated their leaderboard to address the memory-reuse exploit https://t.co/HslEahIM0y there is only one >100x speedup left, on task 23_Conv3d_GroupNorm_Mean in this task, the AI CUDA Engineer forgot the entire conv part and the eval script didn’t catch it
6
13
253
notes: - ‘hacking’ here means ‘bungling the code so tragically that the evaluation script malfunctioned’, not any planned exploit - sakana did a good job following kernelbench eval procedure and publishing reproducible eval code, just (seemingly) didn’t hand-check outlier results
3
7
351
turns out the AI CUDA Engineer achieved 100x speedup by… hacking the eval script
@miru_why I believe there is something wrong with their kernel -- it seems to 'steal' the result of the eager impl (memory reuse somehow?), allowing it to bypass the correctness check. Here, I try executing impls in different order: * torch, cuda * cuda, torch only the first order works!
76
242
3K