Observed issue
==============
LTTng-UST scheme for letting listener threads wait on session daemon
to wake up a futex is similar to the liburcu workqueue code, which has
an issue with spurious wakeups.
This wait/wakeup scheme is only used after the LTTng-UST listener thread
has been unable to connect to the session daemon.
A spurious wakeup on wait_for_sessiond can cause wait_for_sessiond to
return with a sock_info->wait_shm_mmap state of 0, which is unexpected.
However, this should not cause any user-observable issues other than
using slightly more CPU time than strictly needed, because this spurious
wakeup will only cause an additional connection attempt to the session
daemon to fail.
Cause
=====
From futex(5):
FUTEX_WAIT
Returns 0 if the caller was woken up. Note that a wake-up can
also be caused by common futex usage patterns in unrelated code
that happened to have previously used the futex word's memory
location (e.g., typical futex-based implementations of Pthreads
mutexes can cause this under some conditions). Therefore, call‐
ers should always conservatively assume that a return value of 0
can mean a spurious wake-up, and use the futex word's value
(i.e., the user-space synchronization scheme) to decide whether
to continue to block or not.
Solution
========
We therefore need to validate whether the value differs from 0 in
user-space after the call to FUTEX_WAIT returns 0.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I468d8ff302f467ee9924e6edb04476fcb031b4b9
DBG("Waiting for %s apps sessiond", sock_info->name);
/* Wait for futex wakeup */
DBG("Waiting for %s apps sessiond", sock_info->name);
/* Wait for futex wakeup */
- if (uatomic_read((int32_t *) sock_info->wait_shm_mmap))
- goto end_wait;
-
- while (futex_async((int32_t *) sock_info->wait_shm_mmap,
- FUTEX_WAIT, 0, NULL, NULL, 0)) {
+ while (!uatomic_read((int32_t *) sock_info->wait_shm_mmap)) {
+ if (!futex_async((int32_t *) sock_info->wait_shm_mmap, FUTEX_WAIT, 0, NULL, NULL, 0)) {
+ /*
+ * Prior queued wakeups queued by unrelated code
+ * using the same address can cause futex wait to
+ * return 0 even through the futex value is still
+ * 0 (spurious wakeups). Check the value again
+ * in user-space to validate whether it really
+ * differs from 0.
+ */
+ continue;
+ }
/* Value already changed. */
goto end_wait;
case EINTR:
/* Retry if interrupted by signal. */
/* Value already changed. */
goto end_wait;
case EINTR:
/* Retry if interrupted by signal. */
- break; /* Get out of switch. */
+ break; /* Get out of switch. Check again. */
case EFAULT:
wait_poll_fallback = 1;
DBG(
case EFAULT:
wait_poll_fallback = 1;
DBG(