miru @miru_why X Profile

miru

@miru_why

Followers

2K

Following

3K

Media

232

Statuses

646

3e-4x engineer, unswizzled wagmi. specialization is for warps

Joined January 2024

Don't wanna be here? Send us removal request.

miru

@miru_why

2 years

know the difference

8

30

408

miru

@miru_why

2 months

Scalable GANs with Transformers https://t.co/tUfsKKsVlK https://t.co/N2KzfMc4aL authors train latent-space transformer GANs up to XL/2 scale, and report SotA 1-step class-conditional image generation results on ImageNet-256 after 40 epochs (*with REPA in discriminator)

3

34

215

miru

@miru_why

4 months

author thread

Sergio Calvo Ordoñez

@s_calvoordonez

4 months

We'd love our flow-based generative models to learn the optimal transport from noise to data... but they rarely do ❌. Mini-batch Optimal Transport methods aim to fix this — but they're costly and require large batch sizes to work well... Can we approximate this behaviour

0

3

miru

@miru_why

4 months

Weighted Conditional Flow Matching the authors improve flow-matching training by downweighting poorly-matched noise/image pairs, and show that this cheap reweighting produces clean, straight flow paths (like what you'd get from optimal transport) https://t.co/ykj6k1fh4A

1

25

miru

@miru_why

5 months

anyone know a paper where they do RLVR strictly on next token prediction task? like ilya's detective novel analogy, doing reasoning rollouts and rewarding the reasoning chains that correctly deduce the criminal's name?

2

0

10

miru

@miru_why

5 months

english translation of huawei whistleblower's pangu writeup, 2/2. all translation thanks to gemini, with minor edits and [annotations] from discussion, hopefully we did it justice

1

0

23

miru

@miru_why

5 months

english translation of huawei whistleblower's pangu writeup, 1/2

2

1

31

miru

@miru_why

5 months

anonymous whistleblower from noah's ark lab has posted a writeup detailing the sad saga of pangu - they support honestagi's claim that pangu MoE 72B was plagiarized from qwen 2.5 14B - they also witnessed similar plagiarism from qwen 1.5 110B, deepseek-v3 https://t.co/vE5bFjNfS6

9

20

219

miru

@miru_why

5 months

one day in and @giffmana is already fixing annoyances in pytorch main

Seunghyun Seo

@SeunghyunSEO7

5 months

@giffmana @__kolesnikov__ @XiaohuaZhai Rumor has it you hated pytorch so much you joined meta to fix it from the source yourself, LOL

5

1

145

miru

@miru_why

7 months

pytorch transpose vs numpy transpose. baffling

6

5

103

miru

@miru_why

7 months

μTransfer in action

0

1

21

miru

@miru_why

7 months

interesting paper on ‘any-subset’ auto-regressive modeling without the standard product-rule factorization https://t.co/XIkgVS1scu https://t.co/c6nS16t5ci their model can sample from the true joint distribution with 10% fewer NFEs (i.e. speculative decoding with no extra model)

arxiv.org

In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in...

0

9

miru

@miru_why

8 months

ReDi author thread with more information

Thodoris Kouzelis

@ThKouz

8 months

1/n Introducing ReDi (Representation Diffusion): a new generative approach that leverages a diffusion model to jointly capture – Low-level image details (via VAE latents) – High-level semantic features (via DINOv2)🧵

0

2

miru

@miru_why

8 months

Boosting Generative Image Modeling via Joint Image-Feature Synthesis https://t.co/UCPZmp5KDC the authors concatenate normal VAE latents with PCA’d DINOv2 embeddings, and find that diffusion models trained on this joint distribution achieve lower FID than VAE-only or VAE+REPA

4

23

169

miru

@miru_why

8 months

if you were curious about the torch.sum bug discussed in the gpt-4.5 pretraining podcast ( https://t.co/lroQtO20sD), here’s the original thread from last june

Edward Z. Yang

@ezyang

1 year

Crazy long standing data race in PyTorch's reductions 😱 https://t.co/goQzBU2rhs… Credit to the OpenAI team for figuring out (I hear it was weeks of debugging)

1

20

miru

@miru_why

8 months

PixelFlow: Pixel-Space Generative Models with Flow https://t.co/WWLoOYC7n8 https://t.co/zvBhzaTBBL the authors train a pixel space image generator with gradually-increasing spatial resolution across timesteps, and release 1B-scale class- and text-conditional checkpoints

0

22

80

miru

@miru_why

10 months

sakana is now working on a more comprehensive effort to fix all eval script exploits/loopholes discovered by the AI CUDA Engineer and reevaluate their technique. happy to see it and hope they succeed https://t.co/Q4LX6nbdBW

Sakana AI

@SakanaAILabs

10 months

Update: Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system

0

1

43

miru

@miru_why

10 months

sakana have updated their leaderboard to address the memory-reuse exploit https://t.co/HslEahIM0y there is only one >100x speedup left, on task 23_Conv3d_GroupNorm_Mean in this task, the AI CUDA Engineer forgot the entire conv part and the eval script didn’t catch it

6

13

253

miru

@miru_why

10 months

notes: - ‘hacking’ here means ‘bungling the code so tragically that the evaluation script malfunctioned’, not any planned exploit - sakana did a good job following kernelbench eval procedure and publishing reproducible eval code, just (seemingly) didn’t hand-check outlier results

3

7

351

miru

@miru_why

10 months

turns out the AI CUDA Engineer achieved 100x speedup by… hacking the eval script

main

@main_horse

10 months

@miru_why I believe there is something wrong with their kernel -- it seems to 'steal' the result of the eager impl (memory reuse somehow?), allowing it to bypass the correctness check. Here, I try executing impls in different order: * torch, cuda * cuda, torch only the first order works!

76

242

3K

miru

@miru_why

10 months

the whale has spoken

15

60

1K