Clean-up: sessiond: use empty() instead of comparing size to 0 Harmonize the project's coding style a little by favoring the use of the 'empty()' methood of containers rather than comparing their size to 0. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I22e6b7fe4d94d8f43362fe119b4ca6d480587291
Build fix: missing operator- for iterator on g++7 The project fails to build on 'g++ (SUSE Linux) 7.5.0' since its STL implementation assumes that operator- is available for random access iterators. The build fails with the following error: event_name.cpp:82:71: required from here /usr/include/c++/7/bits/stl_iterator_base_funcs.h:104:21: error: no match for ‘operator-’ (operand types are ‘lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>::_iterator<const lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>, const char* const>’ and ‘lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>::_iterator<const lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>, const char* const>’) A trivial implementation of that operator is provided and allows the build to succeed. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ib1637e81e5cdc42cd5a142dcee21150ced9fcc55
Fix: relayd: live client not notified of inactive streams Observed issue -------------- Some LTTng-tools live tests failures appear to show babeltrace2 hanging (failing to print expected events). The problem is intermittent, but Kienan was able to develop a test case that's reproducible for him. The test case performs the following steps: - Start a ust application and leave it running - Configure and then start an lttng live session - Connect a live viewer (babeltrace) - Run a second ust application - Wait for the expected number of events - In the failing case, no events are seen by babeltrace Using per-uid buffers, the test typically completes normally. With per-pid buffers the test fails, hanging indefinitely if waiting for the specified number of events. While "hanging", babeltrace2 is polling the relayd. This affects for babeltrace2 stable-2.0 and master while using lttng-tools master. For more information, see the description of bug #1406[1] Cause ----- When consuming a live trace captured in per-PID mode, Babeltrace periodically requests the index of the next packet it should consume. As part of the reply, it gets a 'flags' field which is used to announce that new streams, or new metadata, are available to the viewer. Unfortunately, these 'flags' are only set when the relay daemon has new tracing data to deliver. It is not set when the relay daemon indicates that the stream is inactive (see LTTNG_VIEWER_INDEX_INACTIVE). In the average case where an application is spawned while others are actively emiting events, a request for new data will result in a reply that returns an index entry (code LTTNG_VIEWER_INDEX_OK) for an available packet accompanied by the LTTNG_VIEWER_FLAG_NEW_STREAM flag. This flag indicates to the viewer that it should request new streams (using the LTTNG_VIEWER_GET_NEW_STREAMS live protocol command) before consuming the new data. In the cases where we observe a hang, an application is running but not emiting new events. As such, the relay daemon periodically emits "live beacons" to indicate that the session's streams are inactive up to a given time 'T'. Since the existing application remains inactive and the viewer is never notified that new streams are available, the viewer effectively remains "stuck" and never notices the new application being traced. The LTTNG_VIEWER_FLAG_NEW_METADATA communicates a similar semantic with regards to the metadata. However, ignoring it for inactive streams isn't as deleterious: the same information is made available to the viewer the next time it will successfully request a new index to the relay daemon. This would only become a problem if the tracers start to express non-layout data (like supplemental environment information, but I don't see a real use-case) as part of the metadata stream that should be made available downstream even during periods of inactivity. Note that the same problem most likely affects the per-UID buffer allocation mode when multiple users are being traced. Solution -------- On the producer end, LTTNG_VIEWER_FLAG_NEW_STREAM is set even when returning an inactivity index. Note that to preserve compatibility with older live consumers that don't expect this flag in non-OK response, the LTTNG_VIEWER_FLAG_NEW_STREAM notification is repeated until the next LTTNG_VIEWER_GET_NEW_STREAMS command that returns LTTNG_VIEWER_INDEX_OK. The 'new_streams' state is no longer cleared from relay sessions during the processing of the LTTNG_VIEWER_GET_NEXT_INDEX commands. Instead, it is cleared when the viewer requests new streams. On Babeltrace's end, the handler of the LTTNG_VIEWER_GET_NEXT_INDEX command (lttng_live_get_next_index) is modified to expect LTTNG_VIEWER_FLAG_NEW_STREAM in the cases where the command returns: - LTTNG_VIEWER_INDEX_OK (as done previously), - LTTNG_VIEWER_INDEX_HUP (new), - LTTNG_VIEWER_INDEX_INACTIVE (new). Drawbacks --------- This is arguably a protocol change as none of the producers ever set the NEW_METADATA/NEW_STREAM flags when indicating an inactive stream. References ---------- [1] https://bugs.lttng.org/issues/1406 Fixes #1406 Change-Id: I84f53f089597ac7b22ce8bd0962d4b28112b7ab6 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Clean-up: tests: bt2 plug-ins: modernize the plug-ins By virtue of their use of the C Babeltrace 2 APIs, the test plug-ins perform a fair amount of manual resource management. To make it possible to adopt a more modern C++ style in those plug-ins, a number of helpers are introduced. Introduce reference wrappers for the Babeltrace 2 interface: - value_ref: wraps a bt_value reference using std::unique_ptr - message_const_ref: wraps a constant message reference using a unique_ptr - message_iterator_ref: wraps a message iterator reference using a unique_ptr - event_class_const_ref: wraps a constant event class reference using a unique_ptr A specialized random_access_container_wrapper is specialized to wrap bt_value arrays of strings. In doing so, it is possible to eliminate the use of gotos and manual reference management on error paths. Some struct/classes are renamed to eliminate ambiguities that arose over the refactoring. The changes allow some simplifications of the code flow in places which are applied directly. Change-Id: I25c148d7970cb89add55a86f2c162973d3d27e4a Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Move the lttng::free util under the lttng::memory namespace Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I40bf5aefaa8f441f470c0866b71b2957a6c30154
Clean-up: run clang-format 14 on the tree Miscellaneous code style changes to correct little violations that slipped through the cracks. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Id378ff3fa42cb69a8543b43c08d60b9a2f2c1c06
Fix: relayd: live: dispose of zombie viewer metadata stream Issue observed ============== In the CI, builds on SLES15SP5 frequently experience timeouts. From prior inspections, there are hangs during tests/regression/tools/clear/test_ust while waiting for babeltrace to exit. It is possible to reproduce the problem fairly easily: $ lttng create --live $ lttng enable-event --userspace --all $ lttng start # Launch an application that emits a couple of events $ ./my_app $ lttng stop # Clear the data, this eventually results in the deletion of all # trace files on the relay daemon's end. $ lttng clear # Attach to the live session from another terminal $ babeltrace -i lttng-live net://... # The 'destroy' command completes, but the viewer never exits. $ lttng destroy Cause ===== After the clear command completes, the relay daemon no longer has any data to serve. We notice that the live client loops endlessly repeatably sending GET_METADATA requests. In response, the relay daemon replies with the NO_NEW_METADATA status. In concrete terms, the viewer_get_metadata() function short-circuits to send that reply when it sees that the metadata stream has no active trace chunk (i.e., there are no backing files from which to read the data at the moment). This situation is not abnormal in itself: it is legitimate for a client to wait for the metadata to become available again. For example, in the reproducer above, it would be possible for the user to restart the tracing (lttng start), which would create a new trace chunk and make the metadata stream available. New events could also be emitted following this restart. However, when a session's connection is closed, there is no hope that the metadata stream will ever transition back to an active trace chunk. Solution ======== When the metadata stream has no active chunk and the corresponding consumerd-side connection has been closed, there is no way the relay daemon will be able to serve the metadata contents to the client. As such, the viewer stream can be disposed-of since it will no longer be of any use to the client. Since some client implementations expect at least one GET_METADATA command to result in NO_NEW_METADATA, that status code is initially returned. Later, when the client emits a follow-up GET_METADATA request for that same stream, it will receive an "error" status indicating that the stream no longer exists. This situation is not treated as an error by the clients. For instance, babeltrace2 will simply close the corresponding trace and indicate it ended. The 'no_new_metadata_notified' flag doesn't appear to be necessary to implement the behaviour expected by the clients (seeing at least one NO_NEW_METADATA status reply for every metadata stream). The viewer_get_metadata() function is refactored a bit to drop the global reference to the viewer metadata stream as it exits, while still returning the NO_NEW_METADATA status code. Known drawbacks =============== None. Note ==== The commit message of e8b269fa provides more details behind the intention of the 'no_new_metadata_notified' flag. Change-Id: Ib1b80148d7f214f7aed221d3559e479b69aedd82 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Docs: relayd: received metadata position is reset on clear Correct a comment in the relayd documentation that incorrectly mentioned the 'sent' position being reset by the 'clear' command. The correct behavior resets the metadata stream's 'received' position to '0', not the 'sent' position. The relay daemon expects to re-receive the metadata contents that matches the previous contents up to the previous 'received' position. The client, however, does not expect to receive the original contents of the metadata stream a second time. Note that from the relay daemon's perspective, a "clear" command does not exist per se. It is implemented as a stream rotation that moves the streams from a trace chunk that has an associated 'DELETE' close command to a new one (which may also be a 'nil' chunk). Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I598fe736c57ab3e934ff0207674d0ecff2bf3e74
lttng: enable-event: print kernel tracer status on error Use the new kernel status query API to present a more descriptive error when a kernel event rule fails to be enabled. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Icad2518bacec1a9ab3103a44052c0085eadda1a7
lttng: enable-event: use the terminology of the documentation Rework most of the human-readable messages of the enable-event to use the terminology used throughout the online documentation and the man pages. Some clean-ups are also done to follow the rest of the project's conventions, such as quoting user input with back-ticks, not ending messages with a period, etc. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I00d89d6e3c32ccbde60081ef427a099fb8cd206e
lttng: enable-event: treat 'all' case as a regular pattern The cmd_enable_events function is essentially duplicated to handle the "all events" case, but it simply substitutes the event name for '*'. The case can be eliminated if we simply add '*' as one of the patterns to enable when the '--all' option is used. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: If4235c391c2ce38a67208184c97bbe0f5c40c97d
lttng: enable-event: remove gotos from cmd_enable_event The use of automated resource management makes it possible to remove the numerous uses of gotos in cmd_enable_event. Replace them by simple return statements. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ic9f3207d8b1e9c5b044506e0233468230db1acd0
lttng: enable-event: wrap mi_writer use in a unique_ptr To allow further clean-ups and simplify the use of STL containers, wrap the manually managed mi_writer instance. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I8a0b21f0647460333bae1c0a2afeb5d2193a2c9b