cloneofsimo Profile Banner
Simo Ryu Profile
Simo Ryu

@cloneofsimo

Followers
14K
Following
2K
Media
1K
Statuses
4K

I like cats, math and codes [email protected]

Seoul, Korea
Joined May 2022
Don't wanna be here? Send us removal request.
@cloneofsimo
Simo Ryu
3 months
10B parameter DiT trained on 80M images, all owned by @freepik . Model commercially usable, raw model without distillation, open sourced. Proud to demonstrate first model-training project with our client @freepik: "F-Lite", from @FAL
Tweet media one
@ivanprado
Iván de Prado
3 months
🚀Excited to announce F Lite: a new open-source text-to-image model by @freepik and @FAL! The first at this scale that’s both open-source and trained exclusively on licensed, high-quality data.🧵.
19
66
475
@cloneofsimo
Simo Ryu
12 hours
Gm cats, and in some cases, humans
Tweet media one
6
0
40
@cloneofsimo
Simo Ryu
17 hours
Will be at sf for a week or so from july 22~. Hmu if u wana chat on whatever interesting.
0
0
28
@cloneofsimo
Simo Ryu
2 days
POV : you have access to sql database.UPDATE FAL_USER_CREDIT.SET CREDIT = CREDIT + 10000.WHERE USERNAME = 'cloneofsimo';
Tweet media one
@noahgsolomon
noah
2 days
just gifted myself fal credits
1
1
29
@cloneofsimo
Simo Ryu
2 days
MuonClip. so many tricks to make maximum logits bounded during training. Gets me wondering why dont people try LASER (and maybe, z-loss ?).
@cloneofsimo
Simo Ryu
7 months
Very interesting, standard attention causes vanishing gradient due to most prob being very small after some training. LASER tackles this by pushing the attention operation on exponential space. i.e., exp_output = sm(QK^T) exp(V). They dont seem to exaggerate on the performance
Tweet media one
Tweet media two
Tweet media three
7
21
303
@cloneofsimo
Simo Ryu
2 days
My bro 50m up and wont flinch
Tweet media one
0
0
18
@cloneofsimo
Simo Ryu
3 days
Huge respect to Kimi for calling this optimizer after muon instead of re-branding into completely different name bullshit like all the other companies / academics do
Tweet media one
@Kimi_Moonshot
Kimi.ai
3 days
🚀 Hello, Kimi K2! Open-Source Agentic Model!.🔹 1T total / 32B active MoE model.🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models.🔹Strong in coding and agentic tasks.🐤 Multimodal & thought-mode not supported for now. With Kimi K2, advanced agentic intelligence
Tweet media one
9
40
760
@cloneofsimo
Simo Ryu
3 days
Unironically best thing ive ever done in my life was poaching @hanchchch .Hes barely on twitter but if there is 10xEng he is the one. @FAL just keeps winning.
Tweet media one
3
0
49
@cloneofsimo
Simo Ryu
4 days
Hey you can't talk like that this model is unaligned
Tweet media one
1
0
26
@cloneofsimo
Simo Ryu
4 days
Tweet media one
1
0
17
@cloneofsimo
Simo Ryu
4 days
Tweet media one
11
3
239
@cloneofsimo
Simo Ryu
4 days
Gm
Tweet media one
2
0
32
@cloneofsimo
Simo Ryu
5 days
This is incredible! muP allows you to predict transfer across width beyond typical noise level.
@ShikaiQiu
Shikai Qiu
6 days
While scaling laws typically predict the final loss, we show in our ICML oral paper that good scaling rules enable accurate predictions of entire loss curves of larger models from smaller ones!. w/@Locchiu, @andrewgwils, J. Pennington, A. Agarwala:.1/10
Tweet media one
2
0
45
@cloneofsimo
Simo Ryu
5 days
Matformer and MoE is like mutually exclusive, but both somehow reasonable.How do we combine both idea into reality? Have someone tried matformer within each expert?.
3
0
30
@cloneofsimo
Simo Ryu
6 days
RT @ezyang: Want to play around with code that uses PyTorch DTensors but annoyed at having to go multiprocess? Make a fake process group! T….
0
26
0
@cloneofsimo
Simo Ryu
6 days
all reasonable things sit on mass = radius^3 plot (as it should be lmao). except for black holes, and dying stars
Tweet media one
0
0
7
@cloneofsimo
Simo Ryu
6 days
"All objects and some question".by Charles H. Lineweaver and Vihan M. Patel. We sit at the dead center of this plot, makes me feel so small but also so important
Tweet media one
1
0
14
@cloneofsimo
Simo Ryu
6 days
The very fact that this post went viral is the hint that the bro's X-Bible is very powerful indeed.
@noahgsolomon
noah
8 days
I broke down Twitter's entire algorithm repo w/ Claude Code. And built "The X Bible" tool that scrapes ur profile+feed and identifies profile misalignment and how to go viral in ur niche. comment "X Bible" and I'll give u access to the tool. only sharing with first 200 for rn
1
0
14
@cloneofsimo
Simo Ryu
7 days
if your LLM is trying to argue with you, just say these powerful words
Tweet media one
1
3
41
@cloneofsimo
Simo Ryu
7 days
this is true @jdchawla29
Tweet media one
1
0
66