Fabian Giesen Profile
Fabian Giesen

@rygorous

Followers
15K
Following
4K
Media
3K
Statuses
84K

Abstraction maker, abstraction breaker. @[email protected] he/him

Joined December 2009
Don't wanna be here? Send us removal request.
@rygorous
Fabian Giesen
8 months
We just released Oodle 2.9.13. Significantly increased BC7 encoding speed (about 20-25% encode time reduction for non-RDO on typical content, 25-30% encode time reduction for RDO) at slightly increased quality. Also several bug fixes and experimental WASM 64-bit support.
2
6
70
@rygorous
Fabian Giesen
9 months
New blog post: "BC7 optimal solid-color blocks" clearing out my "I should write this up" queue, this technique is from. *checks git logs* May 2017. Oh my. (I have quite the backlog.).
fgiesen.wordpress.com
That’s right, it’s another texture compression blog post! I’ll keep it short. By “solid-color block”, I mean a 4×4 block of pixels that all have the same color. A…
0
12
88
@rygorous
Fabian Giesen
9 months
New blog post: "Why those particular integer multiplies?" some explanation and some speculation on the integer SIMD multiplies offered in x86, along with some history.
fgiesen.wordpress.com
The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs ha…
3
10
96
@rygorous
Fabian Giesen
1 year
RT @manorlaboratory: Ok this paper is bonkers and I love it
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
408
0
@rygorous
Fabian Giesen
1 year
We released Oodle 2.9.12 last week:. Some SDK/compiler updates and bug fixes. Also, max texture size limit bumped from 16384x16384 to 2097152x2097152, which should be good for at least the next 4 months or so.
5
5
91
@rygorous
Fabian Giesen
2 years
* AVX-512 paths are significantly faster on AMD Zen 4 CPUs (avoid memory-destination forms on VPCOMPRESSD). This affects mainly BC1 and 3. Plus a bunch of SDK/compiler version updates and smaller fixes!.
1
0
2
@rygorous
Fabian Giesen
2 years
* Re-designed mode/partition selection logic in baseline BC7 encoder (no RDO). Roughly 2x faster at all encoding effort levels at typically same or better result quality. For BC7 RDO, works out to around a 1.2x speed-up typically.
1
0
0
@rygorous
Fabian Giesen
2 years
We just released Oodle 2.9.11 (website isn't updated yet, soon!). This one is focused on Oodle Texture improvements. * Faster end-to-end latency in multi-threaded encoding especially on 24+ core machines, most noticeable for BC[145].
1
1
29
@rygorous
Fabian Giesen
2 years
Workaround (VPCOMPRESSD to reg + separate store) will ship in the next release, is essentially perf-neutral on Intel. FWIW, even with the fix, the AVX-512 kernels are pretty much tied on speed with AVX2 on Zen4 anyway. (Although I suspect they are a bit more power-efficient.).
2
0
8
@rygorous
Fabian Giesen
2 years
Oodle Texture PSA: if you're on a Zen 4 machine, in current releases, encode textures with OodleTex_BCNFlag_AvoidWideVectors (disables usage of AVX-512 instructions). Some of the hot AVX-512 loops heavily use the store forms of VPCOMPRESSD which are quite slow on Zen 4.
1
2
13
@rygorous
Fabian Giesen
2 years
RT @revision_party: The world has dimmed for us. With sadness and his loved ones in our hearts, we say farewell to our friend and fellow ma….
0
14
0
@rygorous
Fabian Giesen
2 years
New blog post: "A very brief BitKnit retrospective" Small codec for a special-purpose application that was only interesting by itself for a relatively short time, but ended up influencing LZNA, Kraken, Mermaid and Leviathan.
fgiesen.wordpress.com
UPDATE May 7, 2023: I wrote this post yesterday somewhat in a huff (for reasons not worth going into) and the original post contained several inaccuracies. These have been corrected in this version…
0
10
40
@rygorous
Fabian Giesen
2 years
when encoding a single continuous stream and measuring latency. (Bottlenecked by critical path latency not overall time spent in optimal parse portion of encoder.).
0
0
0
@rygorous
Fabian Giesen
2 years
Disclaimer on the Mermaid speedup: the ~2x speedup is for 256k chunked encoding in a throughput-bound scenario (e.g. going wide on many chunk encodes at once, our typical use case). Results will vary with larger chunks or no chunking (more bottlenecked on match finding) or.
1
0
3
@rygorous
Fabian Giesen
2 years
2.9.10 (without the b) had a buffer overflow issue in Oodle Texture when encoding BC4/5 from 4x uint16 pixels. (Rarely used but it's an out-of-bounds write so potential memory stomp, don't risk it.).
1
0
0