| 1 | .TH LTTNG_HEALTH_CHECK 3 2012-09-19 "LTTng" "LTTng Developer Manual" |
| 2 | .SH NAME |
| 3 | lttng_health_check \- Monitor health of the session daemon |
| 4 | .SH SYNOPSIS |
| 5 | .nf |
| 6 | .B #include <lttng/lttng.h> |
| 7 | .sp |
| 8 | .BI "int lttng_health_check(enum lttng_health_component c); |
| 9 | .fi |
| 10 | |
| 11 | Link with -llttng-ctl. |
| 12 | .SH DESCRIPTION |
| 13 | The |
| 14 | .BR lttng_health_check () |
| 15 | is used to check the session daemon health for either a specific component |
| 16 | .BR c |
| 17 | or for all of them. Each component represent a subsystem of the session daemon. |
| 18 | Those components are set with health counters that are atomically incremented |
| 19 | once reached. An even value indicates progress in the execution of the |
| 20 | component. An odd value means that the code has entered a blocking state which |
| 21 | is not a poll(7) wait period. |
| 22 | |
| 23 | A bad health is defined by a fatal error code path reached or any IPC used in |
| 24 | the session daemon that was blocked for more than 20 seconds (default timeout). |
| 25 | The condition for this bad health to be detected is that one or many of the |
| 26 | counters are odd. |
| 27 | |
| 28 | The health check mechanism of the session daemon can only be reached through |
| 29 | the health socket which is a different one from the command and the application |
| 30 | socket. An isolated thread serves this socket and only computes the health |
| 31 | counters across the code when asked by the lttng control library (using this |
| 32 | call). This subsystem is highly unlikely to fail due to its simplicity. |
| 33 | |
| 34 | The |
| 35 | .BR c |
| 36 | argument can be one of the following values: |
| 37 | .TP |
| 38 | .BR LTTNG_HEALTH_CMD |
| 39 | Command subsystem which handles user commands coming from the liblttng-ctl or |
| 40 | the |
| 41 | .BR lttng(1) |
| 42 | command line interface. |
| 43 | .TP |
| 44 | .BR LTTNG_HEALTH_APP_MANAGE |
| 45 | The session daemon manages application socket in order to route client command |
| 46 | and check if they get closed which indicates the application shutdown. |
| 47 | .TP |
| 48 | .BR LTTNG_HEALTH_APP_REG |
| 49 | The application registration mechanism is an important and vital part of for |
| 50 | user space tracing. Upon startup, applications instrumented with |
| 51 | .BR lttng-ust(3) |
| 52 | try to register to the session daemon through this subsystem. |
| 53 | .TP |
| 54 | .BR LTTNG_HEALTH_KERNEL |
| 55 | Monitor the Kernel tracer streams and main channel of communication |
| 56 | (/proc/lttng). If this component malfunction, the Kernel tracer is not usable |
| 57 | anymore by lttng-tools. |
| 58 | .TP |
| 59 | .BR LTTNG_HEALTH_CONSUMER |
| 60 | The session daemon can spawn up to |
| 61 | .BR three |
| 62 | consumer daemon for kernel, user space 32 and 64 bit. This subsystem monitors |
| 63 | the consumer daemon(s). A bad health state means that the consumer(s) are not |
| 64 | usable anymore hence likely making tracing not usable. |
| 65 | .TP |
| 66 | .BR LTTNG_HEALTH_ALL |
| 67 | Check all components. If only one of them is in a bad state, a health check |
| 68 | error is returned. |
| 69 | |
| 70 | .SH "RETURN VALUE" |
| 71 | Return 0 if the health is OK, or 1 is it's in a bad state. A return code of \-1 |
| 72 | indicates that the control library was not able to connect to the session |
| 73 | daemon health socket. |
| 74 | |
| 75 | .SH "LIMITATIONS" |
| 76 | |
| 77 | For the LTTNG_HEALTH_CONSUMER, you can not know which consumer daemon has |
| 78 | failed but only that either the consumer subsystem has failed or that a |
| 79 | lttng-consumerd died. |
| 80 | |
| 81 | .SH "AUTHORS" |
| 82 | Written and maintained by David Goulet <dgoulet@efficios.com>. |