SwayStar123 Profile Banner
sway Profile
sway

@SwayStar123

Followers
1K
Following
9K
Media
282
Statuses
1K

Vegan btw, working on diffusion models 日本語できる, 中文学习中

Joined July 2015
Don't wanna be here? Send us removal request.
@SwayStar123
sway
20 hours
Not only is guidance distillation almost a free lunch training wise (very cheap to distill), it also seems to improve the model to avoid some failure cases! 2x faster sampling + slightly better samples :)
7
2
46
@SwayStar123
sway
21 hours
Remember when people made fun of diffusion models for being unable to generate handshakes?
0
0
18
@SwayStar123
sway
2 days
researchers really wasting no time these days
2
6
143
@SwayStar123
sway
2 days
I dont think this is actually that big of a deal MathArena apex is basically picking out all the problems that majority of the then top LLMs could not solve. So gpt5 was gonna get a 0 by definition, any improvements are bound to look impressive in comparison
@scaling01
Lisan al Gaib
2 days
HOLY SHIT OpenAI got smoked MathArena Apex: Gemini 3.0 Pro - 23.4% GPT-5.1 - 1% ouch
1
0
3
@SwayStar123
sway
2 days
Trying out gemini 3 in cursor, its a lot more methodical in finding bugs. After finding a suspect, it will make a test file with minimal repro and useful logs and only then continue to fix it. Whereas claude/chatgpt will assume they are correct and move on to their fix, which
1
0
9
@SwayStar123
sway
2 days
gemini 3 model card
0
0
0
@SwayStar123
sway
4 days
Windsurf is actually pretty good and possibly even better than cursor
0
0
8
@SwayStar123
sway
4 days
cfg/guidance distillation seems to be pretty much free lunch. Why do no labs apart from BFL do it?
4
1
46
@SwayStar123
sway
4 days
ml is so funny cuz sometimes you have bugs and the model still learns and sometimes you have no bugs and the model still doesnt learn
0
0
10
@SwayStar123
sway
6 days
gpt 5.1 codex - garbage, absolutely broken in windsurf gpt 5.1 - pretty good, free so you can use even high thinking unlimited. But i think the thinking is not well integrated (it doesnt remember its previous thoughts) sonnet 4.5 - still the goat, need to use claude to solve bugs
0
1
7
@SwayStar123
sway
8 days
Managed to single handedly revive a dead slack channel by sending the first message there in 3 years, and now suddenly everyone's using it
0
0
4
@SwayStar123
sway
9 days
Anyone wanna give me 8xh100 for a few months? Aim is to get XL@100k = 2 FID or less Theres so much room for improvement I didnt have compute/time to test -Alternative optimizer (Muon, SOAP/Shampoo, Prodigy) -Newer techniques like SPRINT -Custom VAE (Can you outdo RAE with an
1
2
41
@SwayStar123
sway
11 days
Boss arenas are beautiful too
0
0
2
@SwayStar123
sway
11 days
Black Myth Wukong is such a beautiful game I could make most of the random walkways my wallpapers
1
0
4
@SwayStar123
sway
12 days
ok nvm tiny ae is unstable to train, switching to invae from repa-e, modifying for my purposes
0
0
1
@SwayStar123
sway
12 days
claude casually just adding its whole thinking process to my code... Why does claude act like this when its in cursor
1
0
5
@SwayStar123
sway
12 days
Trying out a tinyae arch f32c256 with REPA-E, aligning both the latents, encoder, and decoder. Also added structured latents loss and noise shifting based on channel size from RAE Img is diffusion model samples at 4k steps. Will eval at 100k to see if this is close to RAE
@SwayStar123
sway
1 month
so RAE is really cool, but im thinking, even with their DDT head, scaling this to higher compression would be really hard (naively just increasing patch size and modelling 768*4+ dims) So autoencoders might not quite be dead yet for deep compression Im thinking of experimenting
4
1
78
@SwayStar123
sway
16 days
So REPA works with both siglip and dino, but most people use dino as it has the best performance (by a tiny margin) But if you were doing T2I tasks, wouldnt using siglip be better? You can use it both for text encoding and image representation alignment. Only have to use a
3
0
27
@SwayStar123
sway
17 days
didnt realize @EMostaque started working at AMD
2
4
26
@SwayStar123
sway
17 days
FIBO/Bria4 will soon be 3x faster! Looking into h2 cache too (new paper claiming to be better than teacache)
3
11
71