LTTng calibrate command documentation Mathieu Desnoyers, August 6, 2011 The LTTng calibrate command can be used to find out the combined average overhead of the LTTng tracer and the instrumentation mechanisms used. This overhead can be calibrated in terms of time or using any of the PMU performance counter available on the system. For now, the only calibration implemented is that of the kernel function instrumentation (kretprobes). * Calibrate kernel function instrumentation Let's use an example to show this calibration. We use an i7 processor with 4 general-purpose PMU registers. This information is available by issuing dmesg, looking for "generic registers". This sequence of commands will gather a trace executing a kretprobe hooked on an empty function, gathering PMU counters LLC (Last Level Cache) misses information (see lttng add-context --help to see the list of available PMU counters). (as root) lttng create calibrate-function lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \ -t perf:LLC-prefetch-misses lttng start for a in $(seq 1 10); do \ lttng calibrate --kernel --function; done lttng destroy babeltrace $(ls -1drt ~/lttng-traces/calibrate-function-* | tail -n 1) The output from babeltrace can be saved to a text file and opened in a spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between consecutive "calibrate_entry" and "calibrate_return" events. Note that these counters are per-CPU, so scheduling events would need to be present to account for migration between CPU. Therefore, for calibration purposes, only events staying on the same CPU must be considered. The average result, for the i7, on 10 samples: Average Std.Dev. perf_LLC_load_misses: 5.0 0.577 perf_LLC_store_misses: 1.6 0.516 perf_LLC_prefetch_misses: 9.0 14.742 As we can notice, the load and store misses are relatively stable across runs (their standard deviation is relatively low) compared to the prefetch misses. We can conclude from this information that LLC load and store misses can be accounted for quite precisely, but prefetches within a function seems to behave too erratically (not much causality link between the code executed and the CPU prefetch activity) to be accounted for.