doc/calibrate.txt

   1 LTTng calibrate command documentation
   2 Mathieu Desnoyers, August 6, 2011
   3
   4 The LTTng calibrate command can be used to find out the combined average
   5 overhead of the LTTng tracer and the instrumentation mechanisms used.
   6 This overhead can be calibrated in terms of time or using any of the PMU
   7 performance counter available on the system.
   8
   9 For now, the only calibration implemented is that of the kernel function
  10 instrumentation (kretprobes).
  11
  12
  13 * Calibrate kernel function instrumentation
  14
  15 Let's use an example to show this calibration. We use an i7 processor
  16 with 4 general-purpose PMU registers. This information is available by
  17 issuing dmesg, looking for "generic registers".
  18
  19 This sequence of commands will gather a trace executing a kretprobe
  20 hooked on an empty function, gathering PMU counters LLC (Last Level
  21 Cache) misses information (see lttng add-context --help to see the list
  22 of available PMU counters).
  23
  24 (as root)
  25 lttng create calibrate-function
  26 lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe
  27 lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \
  28                 -t perf:LLC-prefetch-misses
  29 lttng start
  30 for a in $(seq 1 10); do \
  31         lttng calibrate --kernel --function;
  32 done
  33 lttng destroy
  34 babeltrace $(ls -1drt ~/lttng-traces/calibrate-function-* | tail -n 1)
  35
  36 The output from babeltrace can be saved to a text file and opened in a
  37 spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between
  38 consecutive "calibrate_entry" and "calibrate_return" events. Note that
  39 these counters are per-CPU, so scheduling events would need to be
  40 present to account for migration between CPU. Therefore, for calibration
  41 purposes, only events staying on the same CPU must be considered.
  42
  43 The average result, for the i7, on 10 samples:
  44
  45                              Average     Std.Dev.
  46 perf_LLC_load_misses:           5.0       0.577
  47 perf_LLC_store_misses:          1.6       0.516
  48 perf_LLC_prefetch_misses:       9.0      14.742
  49
  50 As we can notice, the load and store misses are relatively stable across
  51 runs (their standard deviation is relatively low) compared to the
  52 prefetch misses. We can conclude from this information that LLC load and
  53 store misses can be accounted for quite precisely, but prefetches within
  54 a function seems to behave too erratically (not much causality link
  55 between the code executed and the CPU prefetch activity) to be accounted
  56 for.