From 65702b8f172b8d2156ab1889f7e7c1b134114ec1 Mon Sep 17 00:00:00 2001 From: Jonathan Rajotte Date: Mon, 15 Mar 2021 11:25:07 -0400 Subject: [PATCH] tests: perf: UNHALTED_REFERENCE_CYCLES might not be actionable on a host MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit This patch does NOT address the root problem and only addresses the validation of the context to be added during the test suite. Observed issue ============== The system_tests jobs for master hangs on the perf event test suites. Cause ===== The hang is caused by a cleanup problem (reference counting of the trace chunk on session destroy/rotation) when the activation of a context fails on a ust app channel. This patch does NOT address the root problem and only addresses the validation of the context to be added during the test suite. In all cases we need to handle when a context fail, but for this test we need to validate that the context can be added and skip the tests as necessary based on the host. The perf tests depend on the presence and accessibility of the UNHALTED_REFERENCE_CYCLES PMU counter. This test suite was previously run "manually" and since it required the presence and access to that PMU. Since that the perf test suite is now run on `make check` when libpfm is present, we need to automate the discovery of UNHALTED_REFERENCE_CYCLES and validate that we can access it. There are three major scenarios were we want to skip the tests. 1) UNHALTED_REFERENCE_CYCLES is simply not present in the PMU sets for that hosts. 2) UNHALTED_REFERENCE_CYCLES is present in the PMU sets but not actionable. This can happen on qemu guests. 3) UNHALTED_REFERENCE_CYCLES is present but not accessible. This can be happen if the `/proc/sys/kernel/perf_event_paranoid` prevents the usage of the PMU. Solution ======== Two problems were found with `find_event.c`. 1) It took the first event matching the passed name even if it was in a PMU not supported by the host. In our use case it worked since the only platform that does not use `r300` is not currently in our testing set. -> = PMU set currently choosen -* = The correct PMU set e.g: -> Intel Core r300 Intel Atom r300 Intel Nehalem r300 Intel Nehalem EX r300 Intel X86 architectural PMU r13c ... -* Intel Skylake r300 On my system only the following are "detected" as per libpfm example found here [1]. [18, ix86arch, "Intel X86 architectural PMU"] [51, perf, "perf_events generic PMU"] [110, rapl, "Intel RAPL"] [114, perf_raw, "perf_events raw PMU"] [200, skl, "Intel Skylake"] Hence the `skl` PMU set should be used. 2) libpfm does not perform any validation as to if the event is actually usable or not. To fix those two problems, we use pfm_get_os_event_encoding and perf_event_open. pfm_get_os_event_encoding [2] is responsible for performing the query across valid PMU sets and encoding it to the perf struct format. perf_event_open is used to validate that the event can be used. It tests the availability on the running host and the accessibility of the PMU. Based on the result of `find_event` the tests are skipped or failed as necessary. Known drawbacks ======== The only drawback is that the tests, albeit having libpfm as a dependency, are not guarantee to run on all hosts. There is not much we can do here. We can only validate that it is indeed run on our CI, most probably using lava hardware-based workers. References ========== [1] https://sourceforge.net/p/perfmon2/libpfm4/ci/288483932c3eb83202b0d8762aa0ed8534982c3f/tree/examples/check_events.c [2] https://man7.org/linux/man-pages/man3/pfm_get_os_event_encoding.3.html [3] https://man7.org/linux/man-pages/man2/perf_event_open.2.html Signed-off-by: Jonathan Rajotte Signed-off-by: Jérémie Galarneau Change-Id: Iea7794dc28d019953930992a2237a1b606368d1f --- tests/perf/find_event.c | 113 ++++++++++++++++++++++++------------ tests/perf/test_perf_raw.in | 63 +++++++++++++++----- 2 files changed, 122 insertions(+), 54 deletions(-) diff --git a/tests/perf/find_event.c b/tests/perf/find_event.c index 38ac6c139..aa1c964c3 100644 --- a/tests/perf/find_event.c +++ b/tests/perf/find_event.c @@ -5,69 +5,106 @@ * */ +#include #include -#include #include +#include +#include +#include + int main(int argc, char **argv) { - int ret, i; - unsigned int j; - pfm_pmu_info_t pinfo; + int ret, fd; + + /* pfm query objects */ + pfm_perf_encode_arg_t pencoder; + pfm_event_info_t info; + + /* Perf event object to be populated by libpfm */ + struct perf_event_attr attr; if (argc != 2) { fprintf(stderr, "Usage: %s \n" "ex: %s UNHALTED_REFERENCE_CYCLES\n" - "Returns the first occurence it finds with " + "Returns the event raw number if found and actionable with" "return code 0.\n" - "If not found returns 1, on error returns -1\n", + "If not found returns 1," + "If not actionable return 2," + "on error returns 255\n", argv[0], argv[0]); ret = -1; goto end; } - memset(&pinfo, 0, sizeof(pinfo)); - pinfo.size = sizeof(pinfo); + /* Initialize perf_event_attr. */ + memset(&attr, 0, sizeof(struct perf_event_attr)); + + /* Initialize libpfm encoder structure. */ + memset(&pencoder, 0, sizeof(pencoder)); + pencoder.size = sizeof(pfm_perf_encode_arg_t); + + /* Initialize libpfm event info structure. */ + memset(&info, 0, sizeof(info)); + info.size = sizeof(info); + + /* Prepare the encoder for query. */ + pencoder.attr = &attr; /* Set the perf_event_attr pointer. */ + pencoder.fstr = NULL; /* Not interested by the fully qualified event string. */ ret = pfm_initialize(); if (ret != PFM_SUCCESS) { fprintf(stderr, "Failed to initialise libpfm: %s", pfm_strerror(ret)); - ret = -1; + ret = 255; + goto end; + } + + ret = pfm_get_os_event_encoding(argv[1], + PFM_PLM0 | PFM_PLM1 | PFM_PLM2 | PFM_PLM3, + PFM_OS_PERF_EVENT, &pencoder); + if (ret != PFM_SUCCESS) { + fprintf(stderr, "libpfm: error pfm_get_os_event_encoding: %s\n", + pfm_strerror(ret)); + ret = 1; + goto end; + } + + /* + * Query the raw code for later use. Do it now to simplify error + * management. + */ + ret = pfm_get_event_info(pencoder.idx, PFM_OS_NONE, &info); + if (ret != PFM_SUCCESS) { + fprintf(stderr, "libpfm: error pfm_get_event_info: %s\n", pfm_strerror(ret)); + ret = 1; goto end; } - pfm_for_all_pmus(j) { - ret = pfm_get_pmu_info(j, &pinfo); - if (ret != PFM_SUCCESS) { - continue; - } - - for (i = pinfo.first_event; i != -1; i = pfm_get_event_next(i)) { - pfm_event_info_t info = - { .size = sizeof(pfm_event_info_t) }; - - ret = pfm_get_event_info(i, PFM_OS_NONE, &info); - if (ret != PFM_SUCCESS) { - fprintf(stderr, "Cannot get event info: %s\n", - pfm_strerror(ret)); - ret = -1; - goto end; - } - - if (info.pmu != j) { - continue; - } - - if (strcmp(info.name, argv[1]) == 0) { - fprintf(stdout, "r%" PRIx64 "\n", info.code); - ret = 0; - goto end; - } - } + /* + * Now that the event is found, try to use it to validate that + * the current user has access to it and that it can be used on that + * host. + */ + + /* Set the event to disabled to prevent unnecessary side effects. */ + pencoder.attr->disabled = 1; + + /* perf_event_open is provided by perfmon/perf_event.h. */ + fd = perf_event_open(pencoder.attr, 0, -1, -1, 0); + if (fd == -1) { + fprintf(stderr, "perf: error perf_event_open: %d: %s\n", errno, + strerror(errno)); + ret = 2; + goto end; } - ret = 1; + /* We close the fd immediately since the event is actionable. */ + close(fd); + + /* Output the raw code for the event */ + fprintf(stdout, "r%" PRIx64 "\n", info.code); + ret = 0; end: return ret; diff --git a/tests/perf/test_perf_raw.in b/tests/perf/test_perf_raw.in index 550c0e9a3..8138c25b4 100644 --- a/tests/perf/test_perf_raw.in +++ b/tests/perf/test_perf_raw.in @@ -40,14 +40,29 @@ function have_libpfm() function test_ust_raw() { - TRACE_PATH=$(mktemp -d) - SESSION_NAME="ust_perf" - CHAN_NAME="mychan" - EVENT_NAME="tp:tptest" - PMU="UNHALTED_REFERENCE_CYCLES" - PERFID=$($CURDIR/find_event $PMU) - test $? -eq "0" - ok $? "Find PMU $PMU" + local TRACE_PATH=$(mktemp -d) + local SESSION_NAME="ust_perf" + local CHAN_NAME="mychan" + local EVENT_NAME="tp:tptest" + local PMU="UNHALTED_REFERENCE_CYCLES" + local tests_to_skip=9 + local ret + + # Find the raw perf id of the event. + PERFID=$("$CURDIR/find_event" "$PMU") + ret=$? + if [ "$ret" -eq "0" ]; then + pass "Find PMU $PMU" + elif [ "$ret" -eq "1" ]; then + skip 0 "PMU event not found." $tests_to_skip + return + elif [ "$ret" -eq "2" ]; then + skip 0 "PMU event not actionable." $tests_to_skip + return + else + fail "find_event returned $ret." + return + fi create_lttng_session_ok $SESSION_NAME $TRACE_PATH @@ -72,14 +87,30 @@ function test_ust_raw() function test_kernel_raw() { - TRACE_PATH=$(mktemp -d) - SESSION_NAME="kernel_perf" - CHAN_NAME="mychan" - EVENT_NAME="lttng_test_filter_event" - PMU="UNHALTED_REFERENCE_CYCLES" - PERFID=$($CURDIR/find_event $PMU) - test $? -eq "0" - ok $? "Find PMU $PMU" + local TRACE_PATH=$(mktemp -d) + local SESSION_NAME="kernel_perf" + local CHAN_NAME="mychan" + local EVENT_NAME="lttng_test_filter_event" + local PMU="UNHALTED_REFERENCE_CYCLES" + local PERFID="" + local tests_to_skip=9 + local ret + + # Find the raw perf id of the event. + PERFID=$("$CURDIR/find_event" "$PMU") + ret=$? + if [ "$ret" -eq "0" ]; then + pass "Find PMU $PMU" + elif [ "$ret" -eq "1" ]; then + skip 0 "PMU event not found." $tests_to_skip + return + elif [ "$ret" -eq "2" ]; then + skip 0 "PMU event not actionable." $tests_to_skip + return + else + fail "find_event returned $ret." + return + fi create_lttng_session_ok $SESSION_NAME $TRACE_PATH -- 2.34.1