Christoph Lutz @chris_skyflier X Profile

Christoph Lutz

@chris_skyflier

Followers

178

Following

2K

Media

52

Statuses

888

Those who don‘t jump will never fly.

Joined August 2010

Don't wanna be here? Send us removal request.

Christoph Lutz

@chris_skyflier

15 hours

7/7.Fast Sync wait behavior can be observed with the following bpftrace scipt: All of the above was tested on 19.26 (with RAC on Exadata).

0

1

Christoph Lutz

@chris_skyflier

15 hours

6/7.Computation of the adaptive sleep duration is quite complex and based on additional runtime counters maintained by fg and bg processes in the sga struct kcrf_alfs_info_ (these are not exposed). Anyway, that's a rabbit hole for another day . 🕵️‍♂️

1

0

Christoph Lutz

@chris_skyflier

15 hours

5/7.Once the backoff sleep time reaches 160 us, it no longer increases (=> 10, 20, 40, 80, 160, 160, 160, . ). The min and max sleep times of 10 and 160 us are hardcoded and cannot be configured.

1

0

Christoph Lutz

@chris_skyflier

15 hours

4/7.Exponential backoff: the fg finds the on-disk scn still lower than its sync scn after sleeping and spinning. So it starts sleeping again using an exponential backoff scheme: sleep for 10 us and double the duration on every iteration until it reaches a max of 160 us.

1

0

Christoph Lutz

@chris_skyflier

15 hours

3/7.Spin: the fg sleeps, finds the on-disk scn lower than its sync scn when it wakes up, and starts to spin for 100 us (_fg_fast_sync_sleep_usecs). After the spin, it finds that lgwr has advanced the on-disk scn beyond its sync scn.

1

0

Christoph Lutz

@chris_skyflier

15 hours

2/7.Sleep: the fg sleeps and finds the on-disk scn greater than its sync scn when it wakes up. The sleep time is an adaptive moving avg, dynamically re-calculated and updated by ckpt every 3 sec (in kcrfw_alfs_con_job).

1

0

Christoph Lutz

@chris_skyflier

15 hours

1/7.On Exadata with pmemlog, the Fast Log File Sync dynamically tunes the log file sync sleep duration to balance responsiveness vs cpu ovherhead (spinning after wakeup). Oracle tracks three wait variants via different session stat counters:. 1. Sleep.2. Spin.3. Backoff Sleeps

1

0

1

Christoph Lutz

@chris_skyflier

2 days

If you've ever felt the need to manually control adaptive lgwr features, this gdb script's got you covered: 😎. It lets you enable and disable adaptive scalable lgwr, fast log file sync, and log parallelism. Highly experimental, of course!. Example 👇

1

6

Christoph Lutz

@chris_skyflier

14 days

RT @ScyllaDB: Learn how xCapture, an eBPF-based Linux thread activity management tool, can capture thread activity of all apps in your syst….

0

2

0

Christoph Lutz

@chris_skyflier

15 days

RT @bsdaemon: AMD still learning the basics about spec side-channels (the things that Intel forgot at this point). what a world.

0

3

0

Christoph Lutz

@chris_skyflier

1 month

11/10.For the curious, a bunch of script to simulate and analyze the behavior:.- Enable public redo strands using gdb: - Simulate RAL contention in a fg using gdb: - Trace lgwr strand deactivation (bpftrace):

0

2

Christoph Lutz

@chris_skyflier

1 month

10/10.All of the above was tested on Oracle 19.26 (Exadata with RAC and no private redo strands, and no in-memory undo). On a related note, the log parallelism feature is only available in Oracle Enterprise Edition.

1

0

Christoph Lutz

@chris_skyflier

1 month

9/10.lgwr uses async i/o, but if the storage subsystem can't handle the async i/o reqs in parallel, log file parallel write and log file sync times may increase. This effect is described in a great post by savvinov ( and in MOS doc id 1980199.1.

savvinov.com

Log parallelism is an optimization introduced in 9.2 that reduces latch contention due to redo copy to the log buffer by enabling multiple public redo buffers (or “strands”). In many ca…

1

0

1

Christoph Lutz

@chris_skyflier

1 month

8/10.Reason for this is that multiple active strands increase the number of lgwr i/o requests, as lgwr must write buffers from each active strand; in other words, more active strands result in more lgwr i/o requests.

1

0

Christoph Lutz

@chris_skyflier

1 month

7/10.When multiple strands are active, Oracle tends to deactivate them fairly aggressively: lgwr disables one strand on every 10th log write (hard-coded in kcrfw_redo_write_driver). Example:

1

0

Christoph Lutz

@chris_skyflier

1 month

6/10.rand_nr is a session-specific random number stored in thread local storage (tls) that is initialized at session creation (by ksurandinit - I haven't investigated this further).

1

0

Christoph Lutz

@chris_skyflier

1 month

5/10.With multiple active strands, the strand in step 1 is chosen randomly: .strand = rand_nr % active_strands. If the RAL get fails (in 1. or 2.), rand_nr is incremented to retry with the next strand (wrapping to strand 0 if needed). Examples:

1

0

Christoph Lutz

@chris_skyflier

1 month

4/10.This is how it works with one active strand:. 1. Attempt a no-wait RAL get on strand 0.2. If this fails, retry once.3. On second failure, activate a new strand.4. Randomly select one of the active strands and acquire its RAL in wait mode.

1

0

Christoph Lutz

@chris_skyflier

1 month

3/10.The number of strands is defined by _log_parallelism_max and they are pre-allocated at startup. Each strand has its own redo allocation latch (RAL) and at runtime, sessions activate and deactivate strands dynamically in response to RAL contention (_log_parallelism_dynamic).

1

0

Christoph Lutz

@chris_skyflier

1 month

2/10.The log parallelism feature divides the redo log buffer into multiple smaller buffers, called public redo strands. This reduces contention during redo buffer allocation by spreading activity across strands, allowing sessions to allocate space for redo entries in parallel.

1

0