bin: compile lttng-sessiond as C++ Same as commit 48a400056134 ("bin: compile lttng as C++"), but change lttng-sessiond to be a C++ program. In addition to the categories of changes already mentioned in that commit's message, here are some interesting changes: - Add an include in trigger.h, an exported header, to fix: CXX notification-thread.lo In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/notification-thread.cpp:9: /home/simark/src/lttng-tools/include/lttng/trigger/trigger.h:142:13: error: use of enum ‘lttng_error_code’ without previous declaration 142 | extern enum lttng_error_code lttng_register_trigger_with_name( | ^~~~~~~~~~~~~~~~ - We get this with clang: CXX lttng-conf.o In file included from /home/simark/src/lttng-tools/src/bin/lttng/conf.cpp:18: In file included from /home/simark/src/lttng-tools/src/common/common.h:14: In file included from /home/simark/src/lttng-tools/src/common/runas.h:17: In file included from /home/simark/src/lttng-tools/src/common/sessiond-comm/sessiond-comm.h:38: In file included from /home/simark/src/lttng-tools/src/common/unix.h:17: /home/simark/src/lttng-tools/src/common/payload-view.h:82:27: error: 'lttng_payload_view_from_payload' has C-linkage specified, but returns user-defined type 'struct lttng_payload_view' which is incompatible with C [-Werror,-Wreturn-type-c-linkage] struct lttng_payload_view lttng_payload_view_from_payload( ^ Turns out that because of the "const" field in lttng_payload_view, clang doesn't consider that type incompatible with C. I don't really want to remove the "const" for C code using that API, so conditionally remove it if we are compiling with clang in C++. - clang gives: CXX event.lo In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/event.cpp:19: /home/simark/src/lttng-tools/src/common/bytecode/bytecode.h:50:1: error: struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat] struct literal_string { ^ It looks like that type isn't even used? Remove it. - it's not possible to initialize some union members, for example with lttcomm_consumer_msg, in consumer.cpp. Initialize it in a separate statement. - It's not possible to use the transparent union trick when calling urcu function, for example in thread_application_registration, in register.cpp. We need to instantiate a cds_wfcq_head_ptr_t object, assign the appropriate field, and pass that object to the function. - the ALIGNED_CONST_PTR trick does not work in C++: CXX consumer.lo In file included from /home/simark/src/lttng-tools/src/common/error.h:19, from /home/simark/src/lttng-tools/src/common/common.h:12, from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:19: /home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp: In function ‘int consumer_send_relayd_socket(consumer_socket*, lttcomm_relayd_sock*, consumer_output*, lttng_stream_type, uint64_t, const char*, const char*, const char*, int, const uint64_t*, time_t, bool)’: /home/simark/src/lttng-tools/src/common/macros.h:116:58: error: expected primary-expression before ‘]’ token 116 | #define ALIGNED_CONST_PTR(value) (((const typeof(value) []) { value })) | ^ /home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:1192:48: note: in expansion of macro ‘ALIGNED_CONST_PTR’ 1192 | ret = consumer_send_fds(consumer_sock, ALIGNED_CONST_PTR(rsock->sock.fd), 1); | ^~~~~~~~~~~~~~~~~ Replace uses with copying the data in a local variable (which is properly aligned), and pass the address to that variable to the function. - In consumer.h, an array field in a structure is defined using the max macro. It can't be replaced with std::max, since std::max isn't constexpr in C++11. Define a max_constexpr function locally and use it. - g++ 7 doesn't support non-trivial designated initializers, leading to errors like: CXX globals.lo /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/globals.cpp:44:1: sorry, unimplemented: non-trivial designated initializers not supported }; ^ Change consumer_data to have a constructor instead. Change initializations of some structures, such as lttcomm_lttng_msg, to initialize the fields separate from the variable declaration. This requires making these variable non-const which is not ideal. But once everything is C++, these types could get a fancy constructor, and then they can be made const again. - When compiling without UST support the stub versions of functions ust_app_rotate_session & co, in ust-app.h, are used. Some of them have the return type "enum lttng_error_code", but return 0, an invalid value, causing: CXX main.o In file included from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/lttng-sessiond.h:22:0, from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:45: /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h: In function ‘lttng_error_code ust_app_snapshot_record(ltt_ust_session*, const consumer_output*, int, uint64_t)’: /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h:575:9: error: invalid conversion from ‘int’ to ‘lttng_error_code’ [-fpermissive] return 0; ^ Change these functions to return LTTNG_ERR_UNK. These functions are not supposed to be called if UST support is not included. But even if they were: all their callers check that the return value is not LTTNG_OK. The value 0 would be considered an error, so will be LTTNG_ERR_UNK. Change-Id: I2cdd34459a54b1943087b43843ef20b35b7bf7d8 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Clean-up: sessiond: prepend `the_` to global variable names This avoids name clashes between global variables and local variables or function parameters (notification_thread_handle, for example). This is a step towards enabling -Wshadow. This also helps readability, in my opinion, as it helps quickly spot that some code is using a global variable. Change-Id: Ib0e35ad7efcc54fa88e1900cab3388b98a06b8d9 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Move to kernel style SPDX license identifiers The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. See https://spdx.org/ids-how for details. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Change-Id: I62e7038e191a061286abcef5550b58f5ee67149d Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: getgrnam is not MT-Safe, use getgrnam_r Running the test suite under a Yocto musl build resulted in musl coredump due to double freeing. We get the following backtraces: 0 a_crash () at ./arch/x86_64/atomic_arch.h:108 1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515 2 free (p=<optimized out>) at src/malloc/malloc.c:526 3 0x00007f46d9dc3849 in __getgrent_a (f=f@entry=0x7f46d9d1f7e0, gr=gr@entry=0x7f46d9e24460 <gr>, line=line@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=nmem@entry=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgrent_a.c:45 4 0x00007f46d9dc2e6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f46d9e24460 <gr>, buf=buf@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgr_a.c:30 5 0x00007f46d9dc3733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37 6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241 7 0x000000000044ee69 in thread_manage_health (data=<optimized out>) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/main.c:4115 8 0x00007f46d9de1541 in start (p=<optimized out>) at src/thread/pthread_create.c:195 9 0x00007f46d9dee661 in __clone () at src/thread/x86_64/clone.s:22 From another run: 0 a_crash () at ./arch/x86_64/atomic_arch.h:108 1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515 2 free (p=<optimized out>) at src/malloc/malloc.c:526 3 0x00007f5abc210849 in __getgrent_a (f=f@entry=0x7f5abc2733e0, gr=gr@entry=0x7f5abc271460 <gr>, line=line@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=nmem@entry=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgrent_a.c:45 4 0x00007f5abc20fe6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f5abc271460 <gr>, buf=buf@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgr_a.c:30 5 0x00007f5abc210733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37 6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241 7 0x000000000042dee4 in notification_channel_socket_create () at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:238 8 init_thread_state (state=0x7f5abaef5560, handle=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:375 9 thread_notification (data=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:495 10 0x00007f5abc22e541 in start (p=<optimized out>) at src/thread/pthread_create.c:195 11 0x00007f5abc23b661 in __clone () at src/thread/x86_64/clone.s:22 The problem was easily reproducible (~6 crash on ~300 runs). A prototype fix using mutex around the getgrnam yielded no crash in over 1000 runs. This patch yielded the same results as the prototype fix. Unfortunately we cannot rely on a mutex in liblttng-ctl since we cannot enforce the locking for the application using the lib. Use getgrnam_r instead. The previous implementation of utils_get_group_id returned the gid of the root group (0) on error/not found. lttng_check_tracing_group needs to know if an error/not found occured, returning the root group is not enough. We now return the gid via the passed parameter. The caller is responsible for either defaulting to the root group or propagating the error. We also do not want to warn when used in liblttng-ctl context. We might want to move the warning elsewhere in the future. For now, pass a bool if we need to warn or not. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Remove the sessiond "ready" counter mechanism This commit replaces the sessiond "ready" counter scheme with the use of the lttng_thread util. The launch of the threads which need to be active before the sessiond can signal its parents (when launched in daemon mode) is now blocking. This means that their associated "launch" functions wait until the threads mark themselves as ready (through the use of a "ready" semaphore) before returning and allowing the initialization of the sessiond to continue. The threads which expose externally-visible resources (UNIX and TCP sockets) which must be fully initialized before marking the session daemon as ready are: - Health thread, - Agent thread, - Client thread. Previously, the "load session" thread was part of this group. However, it is no longer necessary to perform the loading of session configurations in a dedicated thread. The main thread performs that operation itself. It is safe to do so since it is performed after the launch of the client thread. The client thread has to be fully initialized as the session loading code "impersonates" a client to initialize the loaded sessions. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: set the health delta tcp timeout aware The health check subsystem now initialized the time delta using the TCP timeout. It takes the maximum value between our default internal delta and the TCP timeout fetched by the lttcomm inet subsytem. Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: health subsystem issues with shared code TLS memory is not used for health state of each thread. This commit probably fixes bug428 as well. The health_init/exit are renamed to health_register/unregister. Fixes #411 Signed-off-by: David Goulet <dgoulet@efficios.com>
Add time validation to health check The health check code does not have a notion of "time flow": therefore, two consecutive calls to lttng_health_check() might end up returning a bad state (0) just because there was too little time between the invocations. Add some time information to the "last" snapshot, so we can do a time delta between the current and last snapshot to figure out if we need to report the thread as stalled or not. At this point, a thread is considered stalled with a wait time of over 20 seconds. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: Multiple health monitoring fixes * Fix modulo operation bug on #define HEALTH_IS_IN_CODE(x) (x % HEALTH_POLL_VALUE) which is causing the check to think it is never within code. (x % 1 always equals 0). Simplify this by using a simple & on the poll value, and remove the IS_IN_CODE, using ! on IS_IN_POLL instead (which removes nothing to clarity). * Atomic operations should apply to at most "unsigned long" (32-bit on 32-bit arch) rather than uint64_t. * Separate the "error" condition from the counters. We clearly cannot use the "0" value as an error on 32-bit counters anymore, because they can easily wrap. * Introduce "exit" condition, will be useful for state tracking in the future. Error and exit conditions implemented as flags. * Add "APP_MANAGE" in addition to "APP_REG" health check, to monitor the app registration thread (which was missing, only the app manager thread was checked, under the name "APP_REG", which was misleading). * Remove bogus usage of uatomic_xchg() in health_check_state(): It is not needed to update the "last" value, since the last value is read and written to by a single thread. Moreover, this specific use of xchg was not exchanging anything: it was just setting the last value to the "current" one, and doing nothing with the return value. Whatever was expected to be achieved by using uatomic_xchg() clearly wasn't. * Because the health check thread could still be answering a request concurrently sessiond teardown, we need to ensure that all threads only set the "error" condition if they reach teardown paths due to an actual error, not on "normal" teardown condition (thread quit pipe being closed). Flagging threads as being in error condition upon all exit paths would lead to false "errors" sent to the client, which we want to avoid, since the client could then think it needs to kill a sessiond when the sessiond might be in the process of gracefully restarting. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Session daemon health check support This is the first commit for the health check feature of the session daemon. Add a lttng_health_check(...) call to the public API and return 0 if everything is fine or 1 if some health problem for a component was detected. Using this API call, you can either choose to test a specific component such as the client command thread, the consumer thread(s), kernel thread, application registration thread or all of them at the same time. This feature is NOT implemented with the lttng command line UI and it is intended to be like so until a stable version is accepted by the community. NOTE: The API could change so be aware of possible changes up to the 2.1-stable release. Signed-off-by: David Goulet <dgoulet@efficios.com>