From: Mathieu Desnoyers Date: Thu, 23 Jun 2022 20:27:41 +0000 (-0400) Subject: Fix: waiter: futex wait: handle spurious futex wakeups X-Git-Url: https://git.lttng.org/?p=lttng-tools.git;a=commitdiff_plain;h=6e5438dc2a03af904456e5d0ff2e29cade75b253 Fix: waiter: futex wait: handle spurious futex wakeups Observed issue ============== The waiter lttng_waiter_wait() implements a futex wait/wakeup scheme similar to the liburcu workqueue code, which has an issue with spurious wakeups. A spurious wakeup on lttng_waiter_wait can cause lttng_waiter_wait to reach label skip_futex_wait with a waiter->state state of WAITER_WAITING, which is unexpected. It would cause busy-waiting on WAITER_TEARDOWN state to start early. The wait-teardown stage is done with WAIT_ATTEMPTS active attempts, following by attempts spaced by 10ms sleeps. I do not expect that these spurious wakeups will cause user-observable effects other than being slightly less efficient that it should be. This issue will cause spurious unexpected high CPU use, but will not lead to data corruption. Cause ===== From futex(5): FUTEX_WAIT Returns 0 if the caller was woken up. Note that a wake-up can also be caused by common futex usage patterns in unrelated code that happened to have previously used the futex word's memory location (e.g., typical futex-based implementations of Pthreads mutexes can cause this under some conditions). Therefore, call‐ ers should always conservatively assume that a return value of 0 can mean a spurious wake-up, and use the futex word's value (i.e., the user-space synchronization scheme) to decide whether to continue to block or not. Solution ======== We therefore need to validate whether the value differs from WAITER_WAITING in user-space after the call to FUTEX_WAIT returns 0. Known drawbacks =============== None. Signed-off-by: Mathieu Desnoyers Signed-off-by: Jérémie Galarneau Change-Id: Ida9905d1f0b5d9543c8b85ecbd7d748a6f7c1c97 --- diff --git a/src/common/waiter.cpp b/src/common/waiter.cpp index 3ddb68feb..2a1dded89 100644 --- a/src/common/waiter.cpp +++ b/src/common/waiter.cpp @@ -49,15 +49,25 @@ void lttng_waiter_wait(struct lttng_waiter *waiter) } caa_cpu_relax(); } - while (futex_noasync(&waiter->state, FUTEX_WAIT, WAITER_WAITING, - NULL, NULL, 0)) { + while (uatomic_read(&waiter->state) == WAITER_WAITING) { + if (!futex_noasync(&waiter->state, FUTEX_WAIT, WAITER_WAITING, NULL, NULL, 0)) { + /* + * Prior queued wakeups queued by unrelated code + * using the same address can cause futex wait to + * return 0 even through the futex value is still + * WAITER_WAITING (spurious wakeups). Check + * the value again in user-space to validate + * whether it really differs from WAITER_WAITING. + */ + continue; + } switch (errno) { - case EWOULDBLOCK: + case EAGAIN: /* Value already changed. */ goto skip_futex_wait; case EINTR: /* Retry if interrupted by signal. */ - break; /* Get out of switch. */ + break; /* Get out of switch. Check again. */ default: /* Unexpected error. */ PERROR("futex_noasync");