From: Mathieu Desnoyers Date: Mon, 23 Jul 2012 18:00:42 +0000 (-0400) Subject: Fix: Multiple health monitoring fixes X-Git-Tag: v2.1.0-rc1~57 X-Git-Url: https://git.lttng.org/?p=lttng-tools.git;a=commitdiff_plain;h=139ac87245fd1ca18d60a0efca32b50e4c1d8730;hp=139ac87245fd1ca18d60a0efca32b50e4c1d8730 Fix: Multiple health monitoring fixes * Fix modulo operation bug on #define HEALTH_IS_IN_CODE(x) (x % HEALTH_POLL_VALUE) which is causing the check to think it is never within code. (x % 1 always equals 0). Simplify this by using a simple & on the poll value, and remove the IS_IN_CODE, using ! on IS_IN_POLL instead (which removes nothing to clarity). * Atomic operations should apply to at most "unsigned long" (32-bit on 32-bit arch) rather than uint64_t. * Separate the "error" condition from the counters. We clearly cannot use the "0" value as an error on 32-bit counters anymore, because they can easily wrap. * Introduce "exit" condition, will be useful for state tracking in the future. Error and exit conditions implemented as flags. * Add "APP_MANAGE" in addition to "APP_REG" health check, to monitor the app registration thread (which was missing, only the app manager thread was checked, under the name "APP_REG", which was misleading). * Remove bogus usage of uatomic_xchg() in health_check_state(): It is not needed to update the "last" value, since the last value is read and written to by a single thread. Moreover, this specific use of xchg was not exchanging anything: it was just setting the last value to the "current" one, and doing nothing with the return value. Whatever was expected to be achieved by using uatomic_xchg() clearly wasn't. * Because the health check thread could still be answering a request concurrently sessiond teardown, we need to ensure that all threads only set the "error" condition if they reach teardown paths due to an actual error, not on "normal" teardown condition (thread quit pipe being closed). Flagging threads as being in error condition upon all exit paths would lead to false "errors" sent to the client, which we want to avoid, since the client could then think it needs to kill a sessiond when the sessiond might be in the process of gracefully restarting. Signed-off-by: Mathieu Desnoyers Signed-off-by: David Goulet ---