Fabian Giesen @rygorous X Profile

Fabian Giesen

@rygorous

Followers

15K

Following

4K

Media

3K

Statuses

84K

Abstraction maker, abstraction breaker. @[email protected] he/him

Joined December 2009

Don't wanna be here? Send us removal request.

Fabian Giesen

@rygorous

3 years

->

mastodon.gamedev.place

8.15K Posts, 72 Following, 3.86K Followers · Abstraction maker, abstraction breaker. FUN FACT: things I prefix with FUN FACT are sometimes fun and sometimes factual, but very rarely both.

1

0

23

Fabian Giesen

@rygorous

7 months

New blog post: "UNORM and SNORM to float, hardware edition"

fgiesen.wordpress.com

I mentioned in a previous post that doing exact UNORM or SNORM conversions to float in hardware was not particularly expensive, but didn’t go into detail how. Let’s rectify that! (If yo…

0

21

113

Fabian Giesen

@rygorous

8 months

We just released Oodle 2.9.13. Significantly increased BC7 encoding speed (about 20-25% encode time reduction for non-RDO on typical content, 25-30% encode time reduction for RDO) at slightly increased quality. Also several bug fixes and experimental WASM 64-bit support.

2

6

70

Fabian Giesen

@rygorous

9 months

New blog post: "Exact UNORM8 to float" a satisfying solution to a problem that, quite possibly, nobody has.

fgiesen.wordpress.com

GPUs support UNORM formats that represent a number inside [0,1] as an 8-bit unsigned integer. In exact arithmetic, the conversion to a floating-point number is straightforward: take the integer and…

3

21

108

Fabian Giesen

@rygorous

9 months

New blog post: "BC7 optimal solid-color blocks" clearing out my "I should write this up" queue, this technique is from. *checks git logs* May 2017. Oh my. (I have quite the backlog.).

fgiesen.wordpress.com

That’s right, it’s another texture compression blog post! I’ll keep it short. By “solid-color block”, I mean a 4×4 block of pixels that all have the same color. A…

0

12

88

Fabian Giesen

@rygorous

9 months

New blog post: "Why those particular integer multiplies?" some explanation and some speculation on the integer SIMD multiplies offered in x86, along with some history.

fgiesen.wordpress.com

The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs ha…

3

10

96

Fabian Giesen

@rygorous

9 months

New blog post: "Inserting a 0 bit in the middle of a value" I guess it's 2-for-1 bit hacks week.

fgiesen.wordpress.com

This one originally came up for me in Oodle Texture’s BC7 decoder. In the BC7 format, each pixel within a 4×4 block can choose from a limited set of between 4 to 16 colors (ignoring some…

7

16

117

Fabian Giesen

@rygorous

9 months

New blog post: "Zero or sign extend"

fgiesen.wordpress.com

A while back I had to deal with a bit-packed format that contained a list of integer values encoded in one of a pre-defined sets of bit widths, where both the allowed bit widths and the signed-ness…

1

13

87

Fabian Giesen

@rygorous

1 year

RT @manorlaboratory: Ok this paper is bonkers and I love it

0

408

0

Fabian Giesen

@rygorous

1 year

We released Oodle 2.9.12 last week:. Some SDK/compiler updates and bug fixes. Also, max texture size limit bumped from 16384x16384 to 2097152x2097152, which should be good for at least the next 4 months or so.

5

91

Fabian Giesen

@rygorous

2 years

New blog post: "Entropy decoding in Oodle Data: x86-64 6-stream Huffman decoders"

fgiesen.wordpress.com

It’s been a while! Last time, I went over how the 3-stream Huffman decoders in Oodle Data work. The 3-stream layout is what we originally went with. It gives near-ideal performance on the las…

1

20

74

Fabian Giesen

@rygorous

2 years

* AVX-512 paths are significantly faster on AMD Zen 4 CPUs (avoid memory-destination forms on VPCOMPRESSD). This affects mainly BC1 and 3. Plus a bunch of SDK/compiler version updates and smaller fixes!.

1

0

2

Fabian Giesen

@rygorous

2 years

* Re-designed mode/partition selection logic in baseline BC7 encoder (no RDO). Roughly 2x faster at all encoding effort levels at typically same or better result quality. For BC7 RDO, works out to around a 1.2x speed-up typically.

1

0

Fabian Giesen

@rygorous

2 years

We just released Oodle 2.9.11 (website isn't updated yet, soon!). This one is focused on Oodle Texture improvements. * Faster end-to-end latency in multi-threaded encoding especially on 24+ core machines, most noticeable for BC[145].

1

29

Fabian Giesen

@rygorous

2 years

Workaround (VPCOMPRESSD to reg + separate store) will ship in the next release, is essentially perf-neutral on Intel. FWIW, even with the fix, the AVX-512 kernels are pretty much tied on speed with AVX2 on Zen4 anyway. (Although I suspect they are a bit more power-efficient.).

2

0

8

Fabian Giesen

@rygorous

2 years

Oodle Texture PSA: if you're on a Zen 4 machine, in current releases, encode textures with OodleTex_BCNFlag_AvoidWideVectors (disables usage of AVX-512 instructions). Some of the hot AVX-512 loops heavily use the store forms of VPCOMPRESSD which are quite slow on Zen 4.

1

2

13

Fabian Giesen

@rygorous

2 years

RT @revision_party: The world has dimmed for us. With sadness and his loved ones in our hearts, we say farewell to our friend and fellow ma….

0

14

0

Fabian Giesen

@rygorous

2 years

New blog post: "A very brief BitKnit retrospective" Small codec for a special-purpose application that was only interesting by itself for a relatively short time, but ended up influencing LZNA, Kraken, Mermaid and Leviathan.

fgiesen.wordpress.com

UPDATE May 7, 2023: I wrote this post yesterday somewhat in a huff (for reasons not worth going into) and the original post contained several inaccuracies. These have been corrected in this version…

0

10

40

Fabian Giesen

@rygorous

2 years

when encoding a single continuous stream and measuring latency. (Bottlenecked by critical path latency not overall time spent in optimal parse portion of encoder.).

0

Fabian Giesen

@rygorous

2 years

Disclaimer on the Mermaid speedup: the ~2x speedup is for 256k chunked encoding in a throughput-bound scenario (e.g. going wide on many chunk encodes at once, our typical use case). Results will vary with larger chunks or no chunking (more bottlenecked on match finding) or.

1

0

3

Fabian Giesen

@rygorous

2 years

2.9.10 (without the b) had a buffer overflow issue in Oodle Texture when encoding BC4/5 from 4x uint16 pixels. (Rarely used but it's an out-of-bounds write so potential memory stomp, don't risk it.).

1

0