琥珀青葉@LyCORIS @KBlueleaf X Profile

琥珀青葉@LyCORIS

@KBlueleaf

Followers

3K

Following

21K

Media

570

Statuses

3K

Undergraduate in Taiwan. Leader of LyCORIS

Taiwan

Joined May 2021

Don't wanna be here? Send us removal request.

琥珀青葉@LyCORIS

@KBlueleaf

2 years

The paper of LyCORIS project is accepted by ICLR 2024!!!!!!!!. "Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation". My first ever paper, as an undergraduate student. Arxiv here:

arxiv.org

Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading...

5

22

137

琥珀青葉@LyCORIS

@KBlueleaf

17 hours

RT @kuromi_starking: 最近看到好多这种言论….“日本右转了，我要二润了”.开头都是“我在日本十几年了”云云…. 我只能说，你们真在日本呆了十几年的话，你们的记忆都是鱼吗？. 以前外国人啥待遇？现在外国人啥待遇？. 以前所读专业与就业公司不匹配，不给工作签证。….

0

39

0

琥珀青葉@LyCORIS

@KBlueleaf

1 day

Trying to reproduce tread with my custom arch and it looks pretty great. A 340M DiT (custom arch) + qwen3 0.6B + f8c4 vae t2i model trained on danbooru only with 10M sample seen so far already have capability to generate reasonable composition. And it only cost me 11.5hr wall

0

13

琥珀青葉@LyCORIS

@KBlueleaf

3 days

I think the problem is, the UI/UX of gemini app is really really bad, Even AI studio is far more better.Gemini-Cli is good but if it have gui ver with same UX than it will be awesome as well. The model is solid, works well, and have reasonable pricing. But google never made a.

AshutoshShrivastava

@ai_for_success

4 days

Google DeepMind is winning, they just don’t hype it. > Participated in IMO.> IMO officials verified the solution.> Awarded Gold medal.> Quietly dropped the post on a Monday morning. At this point, they’re not even bothered about competing. Also this version of Gemini is coming

1

0

5

琥珀青葉@LyCORIS

@KBlueleaf

4 days

4×V100 16G reach the speed around 1.8~2x 4090 on my vit training task . Pretty impressive considering the price. If you are familiar with some custom builds of those libraries you need and accept fp16 matmul only (or fp32 matmul on cuda core). This setup is quite good.

1

0

20

琥珀青葉@LyCORIS

@KBlueleaf

4 days

LyCORIS get 100 cites

0

19

琥珀青葉@LyCORIS

@KBlueleaf

7 days

it's fun that triton 2dgs kernel works good on v100 (with fp16 precision), pytorch sdpa works good on v100 (O(N) vram + not bad speed). but torch.compile + flex-attention and triton fav2 are basically disaster. slower than doing mem-eff attn on v100's cuda core. wut?.

0

11

琥珀青葉@LyCORIS

@KBlueleaf

7 days

V100 16G × 4. Only cost me 1000buck

17

7

249

琥珀青葉@LyCORIS

@KBlueleaf

8 days

I use 2048 gaussians and use 10/7/8/6 bit quantization for position/log_scale/rotation/color to achieve 7.5 bit per param/0.117 raw bpp on 1024x1024. than I use lzma to compress the custom format state dict to 0.108 bpp.

0

10

琥珀青葉@LyCORIS

@KBlueleaf

8 days

Successfully use 2DGS to beat jpeg in 0.1bpp img compression(????)

2

1

22

琥珀青葉@LyCORIS

@KBlueleaf

10 days

2DGS image approximation test.1024×1024.8192 gaussians.First img is GT.Second: fp32 (2bit per pixel).Third: fp16 (1bit per pixel).Last: video for showing training procedure . Maybe 16384 gaussian + fp16 will be a good idea

0

7

琥珀青葉@LyCORIS

@KBlueleaf

10 days

True dude.

Simo Ryu

@cloneofsimo

11 days

Its incredibly easy to say "I am open source lover" when. 1. You never meaningfully contributed to open source.2. You never financially supported any open source contributors. Hear me out. You don't love open source community. You just love taking other peoples work without.

1

0

9

琥珀青葉@LyCORIS

@KBlueleaf

12 days

An improved version is coming .Up to 4× faster than current impl. Fwd is done, bwd is done impl but have weird issues need to be resolved

琥珀青葉@LyCORIS

@KBlueleaf

13 days

The kernel and whole impl is here. Remember to give it a star!!!.

0

21

琥珀青葉@LyCORIS

@KBlueleaf

12 days

RT @gaunernst: I can confirm that mma.sync.aligned.m16n8k32.row.col.kind::mxf8f6f4.block_scale.f32.e4m3.e4m3.f32.ue8m0 is faster than mma.s….

0

2

0

琥珀青葉@LyCORIS

@KBlueleaf

13 days

RT @KBlueleaf: I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since….

0

4

0

琥珀青葉@LyCORIS

@KBlueleaf

13 days

RT @KBlueleaf: The kernel and whole impl is here. Remember to give it a star!!!.

github.com

Image Gaussian Splatting. Contribute to KohakuBlueleaf/IGS development by creating an account on GitHub.

0

9

0

琥珀青葉@LyCORIS

@KBlueleaf

13 days

The kernel and whole impl is here. Remember to give it a star!!!.

github.com

Image Gaussian Splatting. Contribute to KohakuBlueleaf/IGS development by creating an account on GitHub.

琥珀青葉@LyCORIS

@KBlueleaf

14 days

I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since the largest intermedate state is the output image). Also here is the PoC which train a 256 token feature on one image and each token is a 1024dim vector. I

2

9

75

琥珀青葉@LyCORIS

@KBlueleaf

14 days

BTW If anyone is interested in my triton kernel for 2DGS I can open source them when I have time. currently it is just "working correctly" but I haven't done any profiling on the real performance.

3

0

6

琥珀青葉@LyCORIS

@KBlueleaf

14 days

I write a triton kernel for 2DGS so I can achieve 16384 gaussians on 256x256 image with basically no vram consumption (since the largest intermedate state is the output image). Also here is the PoC which train a 256 token feature on one image and each token is a 1024dim vector. I

1

4

46

琥珀青葉@LyCORIS

@KBlueleaf

15 days

I think I understand how Gaussian splatting works now UwUb

1

0

12

琥珀青葉@LyCORIS

@KBlueleaf

15 days

Playing 2D gaussian splatting UwUb

0

8