KBlueleaf Profile Banner
琥珀青葉@LyCORIS Profile
琥珀青葉@LyCORIS

@KBlueleaf

Followers
3K
Following
21K
Media
569
Statuses
3K

Undergraduate in Taiwan. Leader of LyCORIS

Taiwan
Joined May 2021
Don't wanna be here? Send us removal request.
@KBlueleaf
琥珀青葉@LyCORIS
2 years
The paper of LyCORIS project is accepted by ICLR 2024!!!!!!!!. "Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation". My first ever paper, as an undergraduate student. Arxiv here:
Tweet card summary image
arxiv.org
Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading...
5
22
137
@KBlueleaf
琥珀青葉@LyCORIS
2 days
I think the problem is, the UI/UX of gemini app is really really bad, Even AI studio is far more better.Gemini-Cli is good but if it have gui ver with same UX than it will be awesome as well. The model is solid, works well, and have reasonable pricing. But google never made a.
@ai_for_success
AshutoshShrivastava
3 days
Google DeepMind is winning, they just don’t hype it. > Participated in IMO.> IMO officials verified the solution.> Awarded Gold medal.> Quietly dropped the post on a Monday morning. At this point, they’re not even bothered about competing. Also this version of Gemini is coming
Tweet media one
1
0
5
@KBlueleaf
琥珀青葉@LyCORIS
2 days
4×V100 16G reach the speed around 1.8~2x 4090 on my vit training task . Pretty impressive considering the price. If you are familiar with some custom builds of those libraries you need and accept fp16 matmul only (or fp32 matmul on cuda core). This setup is quite good.
1
0
20
@KBlueleaf
琥珀青葉@LyCORIS
3 days
LyCORIS get 100 cites
Tweet media one
0
0
19
@KBlueleaf
琥珀青葉@LyCORIS
5 days
it's fun that triton 2dgs kernel works good on v100 (with fp16 precision), pytorch sdpa works good on v100 (O(N) vram + not bad speed). but torch.compile + flex-attention and triton fav2 are basically disaster. slower than doing mem-eff attn on v100's cuda core. wut?.
0
0
11
@KBlueleaf
琥珀青葉@LyCORIS
6 days
V100 16G × 4. Only cost me 1000buck
Tweet media one
Tweet media two
Tweet media three
Tweet media four
17
7
250
@KBlueleaf
琥珀青葉@LyCORIS
7 days
I use 2048 gaussians and use 10/7/8/6 bit quantization for position/log_scale/rotation/color to achieve 7.5 bit per param/0.117 raw bpp on 1024x1024. than I use lzma to compress the custom format state dict to 0.108 bpp.
0
0
10
@KBlueleaf
琥珀青葉@LyCORIS
7 days
Successfully use 2DGS to beat jpeg in 0.1bpp img compression(????)
Tweet media one
Tweet media two
Tweet media three
2
1
22
@KBlueleaf
琥珀青葉@LyCORIS
8 days
2DGS image approximation test.1024×1024.8192 gaussians.First img is GT.Second: fp32 (2bit per pixel).Third: fp16 (1bit per pixel).Last: video for showing training procedure . Maybe 16384 gaussian + fp16 will be a good idea
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
7
@KBlueleaf
琥珀青葉@LyCORIS
9 days
True dude.
@cloneofsimo
Simo Ryu
9 days
Its incredibly easy to say "I am open source lover" when. 1. You never meaningfully contributed to open source.2. You never financially supported any open source contributors. Hear me out. You don't love open source community. You just love taking other peoples work without.
1
0
9
@KBlueleaf
琥珀青葉@LyCORIS
11 days
An improved version is coming .Up to 4× faster than current impl. Fwd is done, bwd is done impl but have weird issues need to be resolved
Tweet media one
@KBlueleaf
琥珀青葉@LyCORIS
12 days
The kernel and whole impl is here. Remember to give it a star!!!.
0
0
21
@KBlueleaf
琥珀青葉@LyCORIS
11 days
RT @gaunernst: I can confirm that mma.sync.aligned.m16n8k32.row.col.kind::mxf8f6f4.block_scale.f32.e4m3.e4m3.f32.ue8m0 is faster than mma.s….
0
2
0
@KBlueleaf
琥珀青葉@LyCORIS
11 days
RT @KBlueleaf: I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since….
0
4
0
@KBlueleaf
琥珀青葉@LyCORIS
11 days
RT @KBlueleaf: The kernel and whole impl is here. Remember to give it a star!!!.
Tweet card summary image
github.com
Image Gaussian Splatting. Contribute to KohakuBlueleaf/IGS development by creating an account on GitHub.
0
9
0
@KBlueleaf
琥珀青葉@LyCORIS
12 days
The kernel and whole impl is here. Remember to give it a star!!!.
Tweet card summary image
github.com
Image Gaussian Splatting. Contribute to KohakuBlueleaf/IGS development by creating an account on GitHub.
@KBlueleaf
琥珀青葉@LyCORIS
13 days
I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since the largest intermedate state is the output image). Also here is the PoC which train a 256 token feature on one image and each token is a 1024dim vector. I
2
9
75
@KBlueleaf
琥珀青葉@LyCORIS
13 days
BTW If anyone is interested in my triton kernel for 2DGS I can open source them when I have time. currently it is just "working correctly" but I haven't done any profiling on the real performance.
3
0
6
@KBlueleaf
琥珀青葉@LyCORIS
13 days
I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since the largest intermedate state is the output image). Also here is the PoC which train a 256 token feature on one image and each token is a 1024dim vector. I
1
4
46
@KBlueleaf
琥珀青葉@LyCORIS
13 days
I think I understand how Gaussian splatting works now UwUb
1
0
12
@KBlueleaf
琥珀青葉@LyCORIS
14 days
Playing 2D gaussian splatting UwUb
Tweet media one
0
0
8
@KBlueleaf
琥珀青葉@LyCORIS
14 days
DinoV2 also works!.Both Dino loss and koleo reg loss are descending now. (Without ibot loss since ViT backbone is well trained on AIM)
Tweet media one
Tweet media two
0
0
11
@KBlueleaf
琥珀青葉@LyCORIS
15 days
After join comfy-org for about 3~4mo, I'm now #10 contributor of ComfyUI (additions).UwUb
Tweet media one
0
0
21