
miru
@miru_why
Followers
1K
Following
2K
Media
218
Statuses
613
3e-4x engineer, unswizzled wagmi. specialization is for warps
Joined January 2024
one day in and @giffmana is already fixing annoyances in pytorch main
@giffmana @__kolesnikov__ @XiaohuaZhai Rumor has it you hated pytorch so much you joined meta to fix it from the source yourself, LOL.
5
1
145
interesting paper on ‘any-subset’ auto-regressive modeling without the standard product-rule factorization. their model can sample from the true joint distribution with 10% fewer NFEs (i.e. speculative decoding with no extra model).
arxiv.org
In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution. With discrete diffusion models, the more tokens they generate in...
0
0
9
sakana is now working on a more comprehensive effort to fix all eval script exploits/loopholes discovered by the AI CUDA Engineer and reevaluate their technique. happy to see it and hope they succeed.
Update:. Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system.
0
1
44
turns out the AI CUDA Engineer achieved 100x speedup by… hacking the eval script
@miru_why I believe there is something wrong with their kernel -- it seems to 'steal' the result of the eager impl (memory reuse somehow?), allowing it to bypass the correctness check. Here, I try executing impls in different order:.* torch, cuda.* cuda, torch. only the first order works!
77
250
3K
RT @sedielem: 📢PSA: #NeurIPS2024 recordings are now publicly available!. The workshops always have tons of interesting things on at once, s….
0
40
0