miru_why Profile Banner
miru Profile
miru

@miru_why

Followers
1K
Following
2K
Media
218
Statuses
613

3e-4x engineer, unswizzled wagmi. specialization is for warps

Joined January 2024
Don't wanna be here? Send us removal request.
@miru_why
miru
13 days
author thread
@s_calvoordonez
Sergio Calvo Ordoñez
14 days
We'd love our flow-based generative models to learn the optimal transport from noise to data. but they rarely do ❌. Mini-batch Optimal Transport methods aim to fix this — but they're costly and require large batch sizes to work well. Can we approximate this behaviour
0
0
2
@miru_why
miru
16 days
Weighted Conditional Flow Matching. the authors improve flow-matching training by downweighting poorly-matched noise/image pairs, and show that this cheap reweighting produces clean, straight flow paths (like what you'd get from optimal transport).
Tweet media one
Tweet media two
1
1
22
@miru_why
miru
1 month
anyone know a paper where they do RLVR strictly on next token prediction task? like ilya's detective novel analogy, doing reasoning rollouts and rewarding the reasoning chains that correctly deduce the criminal's name?.
2
0
10
@miru_why
miru
1 month
english translation of huawei whistleblower's pangu writeup, 2/2. all translation thanks to gemini, with minor edits and [annotations] from discussion, hopefully we did it justice
Tweet media one
Tweet media two
Tweet media three
1
0
23
@miru_why
miru
1 month
english translation of huawei whistleblower's pangu writeup, 1/2
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
2
32
@miru_why
miru
1 month
anonymous whistleblower from noah's ark lab has posted a writeup detailing the sad saga of pangu.- they support honestagi's claim that pangu MoE 72B was plagiarized from qwen 2.5 14B.- they also witnessed similar plagiarism from qwen 1.5 110B, deepseek-v3.
Tweet media one
Tweet media two
Tweet media three
9
21
222
@miru_why
miru
2 months
one day in and @giffmana is already fixing annoyances in pytorch main
Tweet media one
@SeunghyunSEO7
Seunghyun Seo
2 months
@giffmana @__kolesnikov__ @XiaohuaZhai Rumor has it you hated pytorch so much you joined meta to fix it from the source yourself, LOL.
5
1
145
@miru_why
miru
3 months
pytorch transpose vs numpy transpose. baffling
Tweet media one
6
6
103
@miru_why
miru
3 months
μTransfer in action
Tweet media one
0
2
22
@miru_why
miru
4 months
interesting paper on ‘any-subset’ auto-regressive modeling without the standard product-rule factorization. their model can sample from the true joint distribution with 10% fewer NFEs (i.e. speculative decoding with no extra model).
Tweet card summary image
arxiv.org
In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in...
0
0
9
@miru_why
miru
4 months
ReDi author thread with more information
@ThKouz
Thodoris Kouzelis
4 months
1/n Introducing ReDi (Representation Diffusion): a new generative approach that leverages a diffusion model to jointly capture.– Low-level image details (via VAE latents).– High-level semantic features (via DINOv2)🧵
Tweet media one
0
0
2
@miru_why
miru
4 months
Boosting Generative Image Modeling via Joint Image-Feature Synthesis. the authors concatenate normal VAE latents with PCA’d DINOv2 embeddings, and find that diffusion models trained on this joint distribution achieve lower FID than VAE-only or VAE+REPA
Tweet media one
Tweet media two
4
23
170
@miru_why
miru
4 months
if you were curious about the torch.sum bug discussed in the gpt-4.5 pretraining podcast (, here’s the original thread from last june.
@ezyang
Edward Z. Yang
1 year
Crazy long standing data race in PyTorch's reductions 😱 Credit to the OpenAI team for figuring out (I hear it was weeks of debugging).
1
1
20
@miru_why
miru
4 months
PixelFlow: Pixel-Space Generative Models with Flow. the authors train a pixel space image generator with gradually-increasing spatial resolution across timesteps, and release 1B-scale class- and text-conditional checkpoints
Tweet media one
0
22
80
@miru_why
miru
6 months
sakana is now working on a more comprehensive effort to fix all eval script exploits/loopholes discovered by the AI CUDA Engineer and reevaluate their technique. happy to see it and hope they succeed.
@SakanaAILabs
Sakana AI
6 months
Update:. Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system.
0
1
44
@miru_why
miru
6 months
sakana have updated their leaderboard to address the memory-reuse exploit there is only one >100x speedup left, on task 23_Conv3d_GroupNorm_Mean. in this task, the AI CUDA Engineer forgot the entire conv part and the eval script didn’t catch it
Tweet media one
6
13
253
@miru_why
miru
6 months
notes:.- ‘hacking’ here means ‘bungling the code so tragically that the evaluation script malfunctioned’, not any planned exploit.- sakana did a good job following kernelbench eval procedure and publishing reproducible eval code, just (seemingly) didn’t hand-check outlier results.
4
7
357
@miru_why
miru
6 months
turns out the AI CUDA Engineer achieved 100x speedup by… hacking the eval script
Tweet media one
@main_horse
main
6 months
@miru_why I believe there is something wrong with their kernel -- it seems to 'steal' the result of the eager impl (memory reuse somehow?), allowing it to bypass the correctness check. Here, I try executing impls in different order:.* torch, cuda.* cuda, torch. only the first order works!
Tweet media one
77
250
3K
@miru_why
miru
7 months
the whale has spoken
Tweet media one
16
63
1K
@miru_why
miru
7 months
RT @sedielem: 📢PSA: #NeurIPS2024 recordings are now publicly available!. The workshops always have tons of interesting things on at once, s….
0
40
0