Fix: baddr-statedump: use $(LIBTOOL) --mode=execute GNU libtool inconsistently places the compiled executable in the source directory or in the .libs directory where a libtool wrapper script is placed in the source directory. While slibtool will always place the compiled executable in the .libs directory and a wrapper script in the source directory. This will result with a build error when using slibtool since objcopy needs the executable and not the shell wrapper script, but this can be solved for both implementations by using $(LIBTOOL) --mode=execute on all commands that operate on the libtool compiled executables. Gentoo issue: https://bugs.gentoo.org/858095 The GNU libtool --mode=excute is documented upstream. https://www.gnu.org/software/libtool/manual/html_node/Execute-mode.html https://www.gnu.org/software/libtool/manual/html_node/Debugging-executables.html And the GNU libtool behavior of when to create a wrapper script is documented in the 'Linking Executables' section. "Notice that the executable, hell, was actually created in the .libs subdirectory. Then, a wrapper script (or, on certain platforms, a wrapper executable see Wrapper executables) was created in the current directory. Since libtool created a wrapper script, you should use libtool to install it and debug it too. However, since the program does not depend on any uninstalled libtool library, it is probably usable even without the wrapper script." https://www.gnu.org/software/libtool/manual/html_node/Linking-executables.html And the inconsistency between GNU libtool and slibtool is documented at the Gentoo wiki. "One difference between GNU libtool and slibtool is that the former will conditionally place the compiled executable or a shell wrapper script in the build directory, depending on whether or not the executable depends on a build-local libtool library (e.g. libfoo.la). Where slibtool will always place a compatible wrapper script in the build directory where GNU libtool would have conditionally placed the executable. When the wrapper script is created both GNU libtool and slibtool will place the executable in the .libs directory within the build directory. Consequently build systems, ebuilds, and other users should take care to avoid scenarios like installing the wrapper script to the system instead of the executable. In these cases ideally the executable would be installed by the same libtool implementation that compiled it." https: //wiki.gentoo.org/wiki/Slibtool#Installing_or_using_binaries_created_by_libtool_manually Signed-off-by: orbea <orbea@riseup.net> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I03102ed78af835daa9b9a5836c2979a5f5d4bd8c
Clean-up: sessiond: use empty() instead of comparing size to 0 Harmonize the project's coding style a little by favoring the use of the 'empty()' methood of containers rather than comparing their size to 0. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I22e6b7fe4d94d8f43362fe119b4ca6d480587291
Build fix: missing operator- for iterator on g++7 The project fails to build on 'g++ (SUSE Linux) 7.5.0' since its STL implementation assumes that operator- is available for random access iterators. The build fails with the following error: event_name.cpp:82:71: required from here /usr/include/c++/7/bits/stl_iterator_base_funcs.h:104:21: error: no match for ‘operator-’ (operand types are ‘lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>::_iterator<const lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>, const char* const>’ and ‘lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>::_iterator<const lttng::utils::random_access_container_wrapper<const bt_value*, const char*, event_name_set_operations>, const char* const>’) A trivial implementation of that operator is provided and allows the build to succeed. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Ib1637e81e5cdc42cd5a142dcee21150ced9fcc55
Fix: relayd: live client not notified of inactive streams Observed issue -------------- Some LTTng-tools live tests failures appear to show babeltrace2 hanging (failing to print expected events). The problem is intermittent, but Kienan was able to develop a test case that's reproducible for him. The test case performs the following steps: - Start a ust application and leave it running - Configure and then start an lttng live session - Connect a live viewer (babeltrace) - Run a second ust application - Wait for the expected number of events - In the failing case, no events are seen by babeltrace Using per-uid buffers, the test typically completes normally. With per-pid buffers the test fails, hanging indefinitely if waiting for the specified number of events. While "hanging", babeltrace2 is polling the relayd. This affects for babeltrace2 stable-2.0 and master while using lttng-tools master. For more information, see the description of bug #1406[1] Cause ----- When consuming a live trace captured in per-PID mode, Babeltrace periodically requests the index of the next packet it should consume. As part of the reply, it gets a 'flags' field which is used to announce that new streams, or new metadata, are available to the viewer. Unfortunately, these 'flags' are only set when the relay daemon has new tracing data to deliver. It is not set when the relay daemon indicates that the stream is inactive (see LTTNG_VIEWER_INDEX_INACTIVE). In the average case where an application is spawned while others are actively emiting events, a request for new data will result in a reply that returns an index entry (code LTTNG_VIEWER_INDEX_OK) for an available packet accompanied by the LTTNG_VIEWER_FLAG_NEW_STREAM flag. This flag indicates to the viewer that it should request new streams (using the LTTNG_VIEWER_GET_NEW_STREAMS live protocol command) before consuming the new data. In the cases where we observe a hang, an application is running but not emiting new events. As such, the relay daemon periodically emits "live beacons" to indicate that the session's streams are inactive up to a given time 'T'. Since the existing application remains inactive and the viewer is never notified that new streams are available, the viewer effectively remains "stuck" and never notices the new application being traced. The LTTNG_VIEWER_FLAG_NEW_METADATA communicates a similar semantic with regards to the metadata. However, ignoring it for inactive streams isn't as deleterious: the same information is made available to the viewer the next time it will successfully request a new index to the relay daemon. This would only become a problem if the tracers start to express non-layout data (like supplemental environment information, but I don't see a real use-case) as part of the metadata stream that should be made available downstream even during periods of inactivity. Note that the same problem most likely affects the per-UID buffer allocation mode when multiple users are being traced. Solution -------- On the producer end, LTTNG_VIEWER_FLAG_NEW_STREAM is set even when returning an inactivity index. Note that to preserve compatibility with older live consumers that don't expect this flag in non-OK response, the LTTNG_VIEWER_FLAG_NEW_STREAM notification is repeated until the next LTTNG_VIEWER_GET_NEW_STREAMS command that returns LTTNG_VIEWER_INDEX_OK. The 'new_streams' state is no longer cleared from relay sessions during the processing of the LTTNG_VIEWER_GET_NEXT_INDEX commands. Instead, it is cleared when the viewer requests new streams. On Babeltrace's end, the handler of the LTTNG_VIEWER_GET_NEXT_INDEX command (lttng_live_get_next_index) is modified to expect LTTNG_VIEWER_FLAG_NEW_STREAM in the cases where the command returns: - LTTNG_VIEWER_INDEX_OK (as done previously), - LTTNG_VIEWER_INDEX_HUP (new), - LTTNG_VIEWER_INDEX_INACTIVE (new). Drawbacks --------- This is arguably a protocol change as none of the producers ever set the NEW_METADATA/NEW_STREAM flags when indicating an inactive stream. References ---------- [1] https://bugs.lttng.org/issues/1406 Fixes #1406 Change-Id: I84f53f089597ac7b22ce8bd0962d4b28112b7ab6 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Clean-up: tests: bt2 plug-ins: modernize the plug-ins By virtue of their use of the C Babeltrace 2 APIs, the test plug-ins perform a fair amount of manual resource management. To make it possible to adopt a more modern C++ style in those plug-ins, a number of helpers are introduced. Introduce reference wrappers for the Babeltrace 2 interface: - value_ref: wraps a bt_value reference using std::unique_ptr - message_const_ref: wraps a constant message reference using a unique_ptr - message_iterator_ref: wraps a message iterator reference using a unique_ptr - event_class_const_ref: wraps a constant event class reference using a unique_ptr A specialized random_access_container_wrapper is specialized to wrap bt_value arrays of strings. In doing so, it is possible to eliminate the use of gotos and manual reference management on error paths. Some struct/classes are renamed to eliminate ambiguities that arose over the refactoring. The changes allow some simplifications of the code flow in places which are applied directly. Change-Id: I25c148d7970cb89add55a86f2c162973d3d27e4a Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Move the lttng::free util under the lttng::memory namespace Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I40bf5aefaa8f441f470c0866b71b2957a6c30154
tests: Replace babelstats.pl with bt2 plugins Observed Issue ============== `tests/regression/tools/filtering/test_valid_filters` is a long running test, especially when running as root and exercising the tests across the kernel domain. I observed that a sizable amount of time was being spent in the analysis of the results using `babelstats.pl`. Solution ======== Instead of using a script to parse the pretty output of babeltrace2, I decided to write two C++ plugins to replicate the behaviour of the `babelstats.pl` script. I measured the time using `sudo -E time ./path/to/test` | Test | Time with `babelstats.pl` | Time with bt2 plugins | | test_tracefile_count | 13.04s | 11.73s | | test_exclusion | 22.75s | 22.07s | | test_valid_filter | 301.04s | 144.41s | The switch to using babeltrace2 plugins reduces the runtime of the `test_valid_filter` test (when running with kernel tests) by half. The runtime changes to the other tests that were modified are not significant. Known drawbacks =============== The field_stats plugin behaviour differs from `babelstats.pl` with regards to enumeration fields ("container" in `babelstats.pl`). However, no tests depend on that behaviour to pass. The field_stats sink plugin doesn't perform a lot of run-time error-checking of functions it invokes, and doesn't fully clean up all the references it allocates though the babeltrace2 API. As the intended usage is for short lived invocations with relatively small traces, the principal drawback of this approach is that errors in the plugin may be harder to debug. Building tests of lttng-tools will now depend on having the babeltrace2 development headers and libraries available. Change-Id: Ie8ebdd255b6901a7d0d7c4cd584a02096cccd4fb Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: Run relayd-grouping tests by grouping type Observed issue ============== The `relayd-grouping/test_ust` test takes ~2 minutes to run. A significant amount of that time is statring and stopping the relay and sesion daemons. Solution ======== Each test function is run with a different grouping setup for the relayd. Rather than iterating over each test and then grouping variations, the iteration can be changed to organize the tests run by grouping setup. This allows us to start th relay and session daemons once per grouping setup, rather than twice for each test function. Further more, each test function is run twice: once with auto-generated session names, once with user-defined session names. This behaviour can be cut out to reduce the runtime of the test further. On my development machine, the test went from running in 113s to 18s. Known drawbacks =============== This no longer exercises the automatic session naming. I don't think that the automatic session naming paths are pertinent with regards to the grouping settings; however it appears it can impact output directories (eg. in `test_ust_uid_streaming_snapshot_add_output_custom_name`). Change-Id: I89d8cb224e594dd68b7e8f3367d1907ecfa2bf13 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: Split test_ust_constructor into several tests Observed issue ============== TAP parsers fail when parsing a single executable that contains several plans. Eg., ``` ok 44 - Found no unexpected events PASS: ust/ust-constructor/test_ust_constructor.py 44 - Found no unexpected events 1..44 ERROR: ust/ust-constructor/test_ust_constructor.py - multiple test plans ok 1 - Create a session ERROR: ust/ust-constructor/test_ust_constructor.py 1 - Create a session # UNPLANNED ``` and ``` 14:03:23 org.tap4j.parser.ParserException: Error parsing TAP Stream: Duplicated TAP Plan found. 14:03:23 at org.tap4j.parser.Tap13Parser.parseTapStream(Tap13Parser.java:257) 14:03:23 at org.tap4j.parser.Tap13Parser.parseFile(Tap13Parser.java:231) 14:03:23 at org.tap4j.plugin.TapParser.parse(TapParser.java:172) 14:03:23 at org.tap4j.plugin.TapPublisher.loadResults(TapPublisher.java:475) 14:03:23 at org.tap4j.plugin.TapPublisher.performImpl(TapPublisher.java:352) 14:03:23 at org.tap4j.plugin.TapPublisher.perform(TapPublisher.java:312) 14:03:23 at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:123) 14:03:23 at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:80) 14:03:23 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) 14:03:23 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818) 14:03:23 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:767) 14:03:23 at hudson.model.Build$BuildExecution.post2(Build.java:179) 14:03:23 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:711) 14:03:23 at hudson.model.Run.execute(Run.java:1918) 14:03:23 at hudson.matrix.MatrixRun.run(MatrixRun.java:153) 14:03:23 at hudson.model.ResourceController.execute(ResourceController.java:101) 14:03:23 at hudson.model.Executor.run(Executor.java:442) 14:03:23 Caused by: org.tap4j.parser.ParserException: Duplicated TAP Plan found. 14:03:23 at org.tap4j.parser.Tap13Parser.parseLine(Tap13Parser.java:354) 14:03:23 at org.tap4j.parser.Tap13Parser.parseTapStream(Tap13Parser.java:252) 14:03:23 ... 16 more ``` Cause ===== 09a872ef0b4e1432329aa42fecc61f50e9baa367 introduced multiple plans in to test_ust_constructor Solution ======== Split the script into several smaller test scripts sharing a common import for data and the bulk of execution. Known drawbacks =============== None. Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: I81649d714afe0e325996b730d5c72cfd5b28d1f8
tests: Add diagnostic info for kernel bug, warning, and oops When test_select_poll_epoll fails with an error due to hitting one a new WARNING, OOPS, or BUG statements in dmesg, the user must go and read the the logs themselves to try and find the matching statements. Providing the previous and new messages in diagnostic output will allow a person reading the test results to more quickly ascertain if the messages are pertinent to lttng-modules or not. That being said, there is no guarantee that there are not other WARNINGs, OOPs, or BUGs in dmesg between before and after that are pertinent. Change-Id: Ida026dfe852cafdcc55979089c92995949e2ef0d Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Clean-up: run clang-format 14 on the tree Miscellaneous code style changes to correct little violations that slipped through the cracks. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Change-Id: Id378ff3fa42cb69a8543b43c08d60b9a2f2c1c06
tests: Add C versions of gen-ust-events-constructor Observed issue ============== The constructor tests exercise only the case where C++ applications are built. Solution ======== Adding C test applications allows us the reuse the existing test infrastructure to cover these cases. Known drawbacks =============== None. Change-Id: Ib178dfd33cce0f1d0aa125aaee078c2dcb84ecb9 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: test_ust_constructor: Split test_ust_constructor binary Observed issue ============== The single test executable gen-ust-events-constructor covers a lot of different cases in a single executable. This decreases the legibility of the test results and debuggability of the test application as many different pieces are in play. Solution ======== The test functionality covered by the executable is split into two main parts: one using a dynamically loaded shared object, and the second using a statically linked archive. Known drawbacks =============== Rather than creating a second test script, the same script is re-used to run multiple TapGenerator sequentially. This could hamper future efforts to parallelize python-based tests. Change-Id: I86d247780ce5412570eada6ebadb83a01547f2b0 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: Ensure `_process` is set in _TraceTestApplications Observed issue ============== An exception is thrown when deleting a _TraceTestApplication object that has thrown an exception during it's `__init__` method. Eg. ``` Exception ignored in: <function _TraceTestApplication.__del__ at 0x7fcbc9a21620> Traceback (most recent call last): File "/home/kstewart/src/efficios/lttng/master/src/lttng-tools/tests/utils/lttngtest/environment.py", line 348, in __del__ self._process.kill() ^^^^^^^^^^^^^ AttributeError: '_TraceTestApplication' object has no attribute '_process' ``` Similarly, this can happen to _WaitTraceTestApplication objects. Cause ===== The object's `_process` attribute is set during `__init__`; however, if an exception is thrown during `subprocess.Popen` a value is never assigned to the attribute. Solution ======== A default value for the `_process` attribute is set and checked as part of the condition when executing the `__del__` method. Known drawbacks =============== None. Change-Id: I2220ae764be49fafb3b977a5e723931421485d63 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: Correct tap_generator skip() when count is greater than 1 Issue observed ============== Output when skipping multiple was incorrectly printing the test case number, eg. ``` ok 3 - Start session `session_ldr8cxix` 41 ok 4 # Skip: Test application 'gen-ust-events-constructor/gen-ust-events-constructor-so' not found ok 6 # Skip: Test application 'gen-ust-events-constructor/gen-ust-events-constructor-so' not found ok 8 # Skip: Test application 'gen-ust-events-constructor/gen-ust-events-constructor-so' not found ``` Cause ===== The `test_number` was adding the current index to the already modified `self._last_test_case_id`. Solution ======== Use `self._last_test_case_id` with no changes. Known drawbacks =============== None. Change-Id: I8ff16b83619cf6e6db2636eeccd58725cc03d0f8 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
tests: test_ust_constructor: Use a C-compiled shared object Similar to the previous change, this change splits the c-style constructors for the shared object into a separate object which can be compiled with gcc instead of g++. This makes it possible to test the constructors are traced even if LTTng-UST uses the LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP build configuration. Change-Id: Icd96cb30cedc1615951a6fec3c72731776f95d81 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>