sessiond: lttng: Add command to check kernel tracer status Issue observed -------------- When `lttng enable-channel --kernel` fails, little feedback is available to user to help them to understand the cause. Eg. ``` Error: Channel asdf: Kernel tracer not available (session auto-20231123-092621) ``` Solution -------- The semantic status of the kernel tracer is tracked and persisted in the session daemon (through `init_kernel_tracer` and `cleanup_tracer_tracer`. A new client command `lttng_kernel_tracer_status` is added to request the current value of the `kernel_tracer_status`. The `lttng` client uses this command after enabling a kernel-domain channel fails to provide the user with a more specific cause of the failure. Eg. ``` Error: Channel asdf: Kernel tracer not available (session auto-20231123-092621) Missing one or more required kernel modules Consult lttng-sessiond logs for more information ``` The kernel tracer status is tracked with an enum defined in `include/lttng/kernel.h` to avoid passing potentially different errno values or locale-dependant strings between the LTTng client and session daemon. Loading modules and checking signatures can fail with a number of different errno values. For example: C.f. https://gitlab.com/linux-kernel/stable/-/blob/master/kernel/module/signing.c#L70 * `EKEYREJECTED` * Any other error code C.f. https://gitlab.com/linux-kernel/stable/-/blob/master/Documentation/security/keys/core.rst * `EKEYREVOKED` * `EKEYEXPIRED` * `ENOKEY` * Others, such as `ENOMEM` Known drawbacks --------------- None. Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I2ae4b188f0110a472200c2511439b9e3e600527d
Clean-up: run format-cpp on the tree The original re-format commit missed a number of files that were caught by format-cpp. Hopefully this is the last large reformat commit for a while. Change-Id: I493ee6d9fe6187e0bd087c68ed346af69c929c1e Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Build fix: POD static_assert check fails on lttng_event_exclusions A build error occurs when building using g++ 6.4.0 on PPC32: In file included from ../../src/common/buffer-view.hpp:11:0, from event.cpp:9: ../../src/common/macros.hpp: In instantiation of 'AllocatedType* zmalloc(size_t) [with AllocatedType = lttng_event_exclusion; size_t = unsigned int]': event.cpp:270:77: required from here ../../src/common/macros.hpp:102:2: error: static assertion failed: type can be malloc'ed static_assert(can_malloc<AllocatedType>::value, "type can be malloc'ed"); ^~~~~~~~~~~~~ A bug affecting gcc [6.1, 7.4] causes flexible array members to generate a destructor for compound types. In turn, this makes any type that contains a flexible array a non-POD object which is a problem under some use-case (e.g., being allocated using C-style memory management facilities). Explicitly specifying a length of zero works around this bug, see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70932 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71147 The same bug affects re2 and they have fixed it in a similar way: https://github.com/google/re2/blob/9049cd28d7496e05e7b7beaec89291d8bc6a31ee/re2/dfa.cc#L123 Change-Id: I730cdeb86bb39cdbfdc5165f854ab5906aeb2192 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
clang-tidy: add Chrome-inspired checks Add the checks used by the Chrome project. Most of these changes were produced and applied by clang-tidy. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I4058dafbdef00e81ac04f202fdfe377384a99e6b
Run clang-format on the whole tree Generated by running: find -not \( -path "./src/vendor*" -prune \) -iname "*.h" -o -iname "*.hpp" -o -iname "*.c" -o -iname "*.cpp" -exec clang-format -i {} \; Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I9557e7d84e305187475ef88124857cf2f438bbb1
sessiond-comm: prefix lttcomm_sessiond_command entries Add LTTCOMM_SESSIOND_COMMAND as the prefix of sessiond client commands to honor the usual namespacing convention. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ie0f9c5282d442e27bcdbbf390ab23d6076c413de
Fix: sessiond: abort called on undefined client command Issue observed -------------- When running in verbose mode, the session daemon calls abort() when it receives an unknown client command: #1 0x00007f66ffd69958 in raise () from /usr/lib/libc.so.6 #2 0x00007f66ffd5353d in abort () from /usr/lib/libc.so.6 #3 0x000055a671a6f6bd in lttcomm_sessiond_command_str (cmd=1633771873) at ../../../src/common/sessiond-comm/sessiond-comm.hpp:199 #4 0x000055a671a73897 in process_client_msg (cmd_ctx=0x7f66f5ff6d10, sock=0x7f66f5ff6c34, sock_error=0x7f66f5ff6c38) at client.cpp:1006 #5 0x000055a671a777fc in thread_manage_clients (data=0x55a673956100) at client.cpp:2622 #6 0x000055a671a6d290 in launch_thread (data=0x55a673956170) at thread.cpp:68 Cause ----- process_client_msg() logs the client command on entry. While it previously logged the numerical value, it now provides the string-ified version of the command id (since 19f912db8). The lttcomm_sessiond_command_str() function aborts when it encounters an unknown command id. This is reasonable (in so far that it is how we handle these situations, typically). However, the validity of the command must be checked beforehand as it comes from an external (untrusted) source. Solution -------- Add lttcomm_sessiond_command_is_valid and tombstone command IDs to lttcomm_sessiond_command to ensure only valid command ids are passed to lttcomm_sessiond_command_str when logging. Drawbacks --------- None Reported-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ibd36f1e69da984c7f27b55ec68e5e3fe06d7ac91
Fix: sessiond: size-based rotation threshold exceeded in per-pid tracing (2/2) For a complete description of the original problem, refer to the previous commit. This change implements the second part of the fix. Buffer statistic samples are augmented to include the channel's session id. Since a session can outlive its channels (on the session daemon side), the consumed size conditions are now bound to the session. This means that the "total consumed" state is now part of the session_info structure exclusively which, overall, is cleaner. A side-effect of this change is that consumed size conditions are now also evaluated when a trigger is registered or when a client subscribes to it via a notification channel instead of waiting until the next monitoring sample. The buffer statistics sample also expresses a "consumed size" that is relative to the last sample that was successfully sent. Finally, the consumer daemon sends a final buffer statistics sample when a channel is torn down. As explained in more detail in the previous commit, this makes the accounting of per-pid sessions more reliable when short-live applications are traced. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I322b9f20977e59c63cf2a4254c97c4ee657e933e
Change backing type of lttng_uuid to std::array Changing the backing type of lttng_uuid to std::array allows us to return lttng_uuid from a function. This, in return, makes it possible to initialize const attributes from the return value of a function returning a UUID. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ie092eab4a848a41ddd9c63f779514f1e4ca2a441
Add type-checked versions of allocation and deallocations functions A common mistake when porting things from C to C++ is to use malloc for types that have a on-trivial constructor (and to not call the constructor explicitly either). For example: struct foo { std::vector<int> field; }; foo *f = (foo *) zmalloc(sizeof(*f)); This allocates a `foo` without calling the constructor, leaving it in an invalid state. Same idea when free-ing with free something that has a non-trivial destructor. To avoid this, I suggest adding templated allocation functions that we will use throughout, that verify if the given type is safe to malloc (and generate a compilation failure if not). The existing code barely needs changes. For example: - (foo *) zmalloc(sizeof(*f)) + zmalloc<foo>() For simplicity I propose that as soon as a type is non-POD (std::is_pod<T>::value is false), we prevent using malloc/free on it. It would be ok in theory to allocate such a type with malloc and free with free, but call the constructor (using placement-new) and destructor explicitly, but I don't see why we would want to do that. It might also be technically more correct to use a combination of std::is_trivially_constructible and std::is_trivially_destructible (std::is_pod being not fine-grained enough), but using std::is_pod just keeps things simpler. This patch introduces the following templated allocation functions: 1. zmalloc<T>() 2. zmalloc<T>(size) 3. malloc<T>() 4. malloc<T>(size) 5. calloc<T>(nmemb) 1. Allocate one T, zero-initialized 2. Allocate a buffer of size `size`, zero-initialized, this is used when the caller calculates the size to allocate, like when using flexible array members 3. Same as 1, but without the zero-initialization 4. Same as 2, but without the zero-initialization 5. Allocate an array of `nmemb` elements of type T, zero-initialized For the de-allocation side, add templated `free` function declaration that uses std::enable_if (SFINAE) to declare a deleted prototype if the type T isn't safe to free (causing a compilation error). There are a lot of places where we pass pointers to void to free. These can't be checked, as we don't know what type of object the pointer really points to. We could forbid that and fix all callers to pass a typed pointer, but that seems a bit too much to chew for the moment. So for now, simply accept that freeing pointers to void won't be checked. It's a best effort. As an example, if I add an explicit constructor to type ctf_trace (in src/bin/lttng-relayd/ctf-trace.h), I get the following errors with clang. For the allocation: /home/simark/src/lttng-tools/src/common/macros.h:57:2: error: static_assert failed due to requirement 'std::is_pod<ctf_trace>::value' "type is POD" static_assert (std::is_pod<T>::value, "type is POD"); ^ ~~~~~~~~~~~~~~~~~~~~~ /home/simark/src/lttng-tools/src/bin/lttng-relayd/ctf-trace.cpp:84:10: note: in instantiation of function template specialization 'zmalloc<ctf_trace>' requested here trace = zmalloc<ctf_trace>(); ^ For the de-allocation: /home/simark/src/lttng-tools/src/bin/lttng-relayd/ctf-trace.cpp:29:2: error: call to deleted function 'free' free(trace); ^~~~ /home/simark/src/lttng-tools/src/common/macros.h:125:6: note: candidate function [with T = ctf_trace, $1 = void] has been explicitly deleted void free(T *p) = delete; ^ /usr/include/stdlib.h:565:13: note: candidate function extern void free (void *__ptr) __THROW; ^ Change-Id: I246a9113d08fa36b81a49137f4e80a5e808de913 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Rename C++ header files to .hpp Rename all C++ header files (include/**/*-internal.h, src/**/*.h except argpar and msgpack, some headers in tests) to have the .hpp extension. Doing so highlights that we include some C++ header files in some test files still compiled as C. This is ok for now, as the files they include don't actually contain C++ code incompatible with C yet, but they could eventually. This is something we can fix later. Change-Id: I8bf326b6b2946a3e26704f3ef3ac5831bbe9bc26 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Clean-up: sessiond-comm: out of bounds access warning gcc 11.2 produces the two following warnings. In both case, setting an array's dimension to zero is used to express a variable length array of names that are LTTNG_SYMBOL_NAME_LEN bytes long. gcc doesn't know about this and correctly points out that an access is taking place outside of the array's bounds. Omit the '0' dimension to work around this warning. event.cpp: In function 'ssize_t lttng_event_create_from_payload(lttng_payload_view*, lttng_event**, lttng_event_exclusion**, char**, lttng_bytecode**)': event.cpp:320:62: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds] 320 | ret = lttng_strncpy(local_exclusions->names[i], | ~~~~~~~~~~~~~~~~~~~~~~~~~^ In file included from event.cpp:16: ../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names' 569 | char names[0][LTTNG_SYMBOL_NAME_LEN]; | ^~~~~ event-rule/user-tracepoint.cpp: In function 'lttng_event_rule_generate_exclusions_status lttng_event_rule_user_tracepoint_generate_exclusions(const lttng_event_rule*, lttng_event_exclusion**)': event-rule/user-tracepoint.cpp:383:61: warning: array subscript i is outside array bounds of 'char [0][256]' [-Warray-bounds] 383 | copy_ret = lttng_strncpy(exclusions->names[i], exclusion_str, | ~~~~~~~~~~~~~~~~~~~^ In file included from ../../src/common/runas.h:17, from event-rule/user-tracepoint.cpp:17: ../../src/common/sessiond-comm/sessiond-comm.h:569:14: note: while referencing 'lttng_event_exclusion::names' 569 | char names[0][LTTNG_SYMBOL_NAME_LEN]; | ^~~~~ Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I260185f2baf085ca4486ce3b13696ee5fa55938a
Fix: liblttng-ctl comm: lttng_event_context is not packed Observed issue ============== The size of the struct differs between bitness for x86-64 and x86 leading to serialization/deserialization problem across client (liblttng-ctl) and lttng-sessiond. sizeof(struct lttng_event_context): x86: 308 x86-64: 312 The struct cannot be marked as LTTNG_PACKED since it is part of the API. Solution ======== Adopt a similar pattern to the new API with a "serialize" & "create_from_buffer" approach. Most of the complexity is moved to `src/common/event.c` Known drawbacks ========= None. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ieb400eab2a2df4070ff51cb2b44929d3ea945ce4
Fix: liblttng-ctl comm: lttng_event is not packed Observed issue ============== In `lttcomm_session_msg` the lttng_event struct is marked as LTTNG_PACKED. This statement have no effect as explained in commit [2]. Solution ======== Adopt a similar pattern to the new API with a "serialize" & "create_from_buffer" approach. Most of the complexity is moved to `src/common/event.c` Known drawbacks ========= None. Note ==== Jérémie Galarneau: This patch was extensively modified from the original patch applying against stable-2.12 to accomodate for the use of the lttng_payload utils throughout the liblttng-ctl <-> lttng-sessiond communication code. Some changes were also made to build as C++. Reference ======== [1] https://review.lttng.org/gitweb?p=lttng-tools.git;a=commit;h=7bd95aee4660c6419a4a65429fc27754481e7e90 Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I35d848519dacb2b119324e88f262aa95951e4ac6
Fix: liblttng-ctl comm: lttng_channel is not packed Observed issue ============== The size of the struct differs between bitness for x86-64 and x86 leading to serialization/deserialization problem across client (liblttng-ctl) and lttng-sessiond. sizeof(struct lttng_channel): x86: 608 x86-64: 624 The struct cannot be marked as LTTNG_PACKED since it is part of the API. Solution ======== Adopt a similar pattern to the new API with a "serialize" & "create_from_buffer" approach. The only particularity is that we need to flatten the channels on listing. Most of the complexity is moved to `src/common/channel.c` Known drawbacks ========= None. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Id5c9aaf3cf8b3d739b71263c02cae8d4d2fedfe3
Fix: Unexpected payload size in cmd_recv_stream_2_11 Observed issue ============== For the following scenario: lttng-relayd: 64 bit lttng-sessiond: 64 bit lttng-consumerd: 32 bit application: 32 bit Commands lttng create --set-url=net://127.0.0.1 lttng enable-event -u -a lttng start ./application On application start the lttng-relayd reports this error: DEBUG1 - 14:16:38.216442600 [2004731/2004735]: Done receiving control command payload: fd = 19, payload size = 4376 bytes (in relay_process_control_receive_payload() at main.c:3456) DEBUG3 - 14:16:38.216469462 [2004731/2004735]: Processing "RELAYD_ADD_STREAM" command for socket 19 (in relay_process_control_command() at main.c:3327) Error: Unexpected payload size in "cmd_recv_stream_2_11": expected >= 3519925694 bytes, got 4376 bytes Cause ===== In `relayd_add_stream`, instead of taking the > 2.11 protocol path, the `relayd_add_stream_2_2` function is called. The value of the rsock version number are: major: 21845 minor: 2 Which is simply invalid since we know that the version should be 2.12. The relayd sock version numbers are set during the LTTNG_CONSUMER_ADD_RELAYD_SOCKET command between the lttng-sessiond and the lttng-consumerd process. It is important to note here that both processes do NOT have the same bitness. The serialization and deserialization of `struct lttcomm_relayd_sock` is the culprit. `struct lttcomm_relayd_sock` contains a `struct lttcomm_sock`: struct lttcomm_sock { int32_t fd; enum lttcomm_sock_proto proto; struct lttcomm_sockaddr sockaddr; const struct lttcomm_proto_ops *ops; } LTTNG_PACKED; Note that `ops` is a pointer and its size varies based on the bitness of the application. Hence the size of the `struct lttcomm_sock` differs across bitness. Since it is the first member of `struct lttcomm_relayd_sock`, the memory layout is simply invalid across bitness (amd64/x86). This results in invalid parsing for the overall "struct lttcomm_relayd_sock" when dealing with a lttng-consumerd with a different bitness than the lttng-sessiond. As far as I know local tracing scenarios are not affected since this is only relevant when dealing with a lttng-relayd. Solution ======== Pass the socket protocol type, relayd major, relayd minor in `lttcomm_consumer_msg`. On the receiver side, query the network stack to get the peer information to populate a basic `lttcomm_sock`. Leaving this work to the OS saves us from having to serialize the `sockaddr_in*` structs. Known drawbacks ========= We rely on `getpeername` for the first time. Compatibility might be a problem. This code path assumes a lot of thing that cannot be asserted against such as the fact that the socket from which we fetch the info must be `connected`. Still at this point, the socket is completely setup and the rest of the code depends on it already. From GETPEERNAME(2): ``` For stream sockets, once a connect(2) has been performed, either socket can call getpeername() to obtain the address of the peer socket. On the other hand, datagram sockets are connectionless. Calling connect(2) on a datagram socket merely sets the peer address for outgoing datagrams sent with write(2) or recv(2). The caller of connect(2) can use getpeername() to obtain the peer address that it earlier set for the socket. However, the peer socket is unaware of this information, and calling getpeername() on the peer socket will return no useful information (unless a connect(2) call was also executed on the peer). Note also that the receiver of a datagram can obtain the address of the sender when using recvfrom(2). ``` But here we are always "the caller of connect". Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ic157c4137b2f20e394c907136687fcbd126f90a0
src/common: use single Makefile for parallel builds Use a single Makefile in 'src/common' as it contains multiple subdirectories with a small number of objects to compile. This allows faster parallel builds since parallelism in automake is applied per Makefile. There is anectodal evidence of a 25 seconds improvement to the build process on a 36 core machine. Change-Id: If2ce266050e345d58b00bf65b574ccf5168f28f1 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Copyright ownership transfer Apply copyright ownership transfer from David Goulet, Julien Desfossez, and Simon Marchi to EfficiOS Inc. Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030087.html Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030092.html Link: https://lists.lttng.org/pipermail/lttng-dev/2022-January/030091.html Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Signed-off-by: David Goulet <dgoulet@ev0ke.net> Signed-off-by: Julien Desfossez <ju@klipix.org> Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Change-Id: Id13575afd4a2a09bb91a8d2b7a12dc3db8dc329f