| 1 | LTTng modules design |
| 2 | --------------------- |
| 3 | |
| 4 | by Mathieu Desnoyers |
| 5 | June 30, 2020 |
| 6 | |
| 7 | This document covers the high level design of lttng-modules. |
| 8 | |
| 9 | LTTng modules is a kernel tracer for the Linux kernel. It can be either |
| 10 | loaded as a set of kernel modules, or built into a Linux kernel. |
| 11 | |
| 12 | Here are its key components: |
| 13 | |
| 14 | * LTTng modules ABI |
| 15 | |
| 16 | Files: |
| 17 | - src/lttng-abi.c |
| 18 | - include/lttng/abi.h |
| 19 | |
| 20 | This ABI consists of ioctls with code 0xF6. It extensively uses |
| 21 | anonymous file descriptors to represent the tracer "objects". Only |
| 22 | root is allowed to interact with those ioctls. |
| 23 | |
| 24 | |
| 25 | * LTTng session, channels, contexts and events management |
| 26 | - src/lttng-events.c |
| 27 | - include/lttng/lttng-events.h |
| 28 | |
| 29 | Current state about configured tracing sessions, channels, contexts |
| 30 | and events. The session, channel, context and event state is |
| 31 | manipulated through the LTTng modules ABI. A session contains 0 or |
| 32 | more channels, through which data is traced. A channel is associated |
| 33 | with an instance of a lib ring buffer client. Channels have 0 or more |
| 34 | events, which are associated to kernel instrumentation as event |
| 35 | sources. |
| 36 | |
| 37 | |
| 38 | * lib ring buffer |
| 39 | |
| 40 | Generic ring buffer library (kernel implementation). Note, there is |
| 41 | a very similar copy of this implementation within the lttng-ust |
| 42 | user-space tracer. The overall goal of this library is to support |
| 43 | both kernel and user-space tracing. |
| 44 | |
| 45 | Files: |
| 46 | - src/lib/ringbuffer/* |
| 47 | - include/ringbuffer/* |
| 48 | |
| 49 | Those include ring buffer ABI meant for consuming the buffer data |
| 50 | from user-space. It is implemented in: |
| 51 | |
| 52 | - src/lib/ringbuffer/ring_buffer_vfs.c (open, release, poll, ioctl) |
| 53 | - src/lib/ringbuffer/ring_buffer_mmap.c (mmap) |
| 54 | - src/lib/ringbuffer/ring_buffer_splice.c (splice) |
| 55 | - include/ringbuffer/vfs.h: lib ring buffer ioctl commands (code 0xF6). |
| 56 | |
| 57 | The ring buffer library can be configured to be used in various |
| 58 | use-cases by creating a specialized ring buffer "client" (template). |
| 59 | include/ringbuffer/config.h details the various configuration |
| 60 | parameters which are supported. |
| 61 | |
| 62 | |
| 63 | * LTTng modules ring buffer clients |
| 64 | |
| 65 | Files: |
| 66 | - src/lttng-ring-buffer-client-discard.c |
| 67 | - src/lttng-ring-buffer-client-mmap-discard.c |
| 68 | - src/lttng-ring-buffer-client-mmap-overwrite.c |
| 69 | - src/lttng-ring-buffer-client-overwrite.c |
| 70 | - src/lttng-ring-buffer-metadata-client.c |
| 71 | - src/lttng-ring-buffer-metadata-mmap-client.c |
| 72 | - src/lttng-ring-buffer-client.h |
| 73 | - src/lttng-ring-buffer-metadata-client.h |
| 74 | |
| 75 | Those are the users of lib ring buffer, with specialized instances of |
| 76 | the ring buffer for each use-case supported by LTTng. Those are |
| 77 | hand-crafted templates in C. The fast-paths are inlined within each |
| 78 | client, and the slow paths are kept in the common library to minimize |
| 79 | code memory usage. |
| 80 | |
| 81 | |
| 82 | * LTTng filter |
| 83 | |
| 84 | The filter in lttng-modules is meant to quickly discard events which |
| 85 | do not match an expression. The expression parsing is all done in |
| 86 | userspace within lttng-tools. The filter is received by lttng-modules |
| 87 | as a bytecode. The frequent case for which a filter is optimized is to |
| 88 | discard most of the events. The filter operates on input arguments |
| 89 | received on the stack, before the ring buffer is touched. |
| 90 | |
| 91 | Files: |
| 92 | - include/lttng/filter-bytecode.h: LTTng filter bytecode. |
| 93 | - src/lttng-filter-validator.c: Validation pass on bytecode reception |
| 94 | - src/lttng-filter.c: Filter linker code: link a bytecode onto a given |
| 95 | event (knowing its fields offsets). |
| 96 | - src/lttng-filter-specialize.c: Specialize the bytecode, transforming |
| 97 | generic instructions into |
| 98 | type-specific (faster) instructions. |
| 99 | - src/lttng-filter-interpreter.c: Bytecode interpreter, called by |
| 100 | instrumentation to filter events. |
| 101 | |
| 102 | * LTTng contexts |
| 103 | |
| 104 | LTTng-modules supports the notion of "contexts" which can be attached either |
| 105 | to specific events or to all events in a channel. Those are additional |
| 106 | data which can be saved prior to the event payload, e.g. current |
| 107 | thread ID, process name, performance counters, and more. |
| 108 | |
| 109 | Files: |
| 110 | - src/lttng-context.c: Context state associated to a channel or event, |
| 111 | and helpers. |
| 112 | - src/lttng-context-*.c: Implementation of all supported contexts: |
| 113 | callstack, cgroup-ns, cpu-id, egid, euid, gid, hostname, |
| 114 | interruptible, ipc-ns, migratable, mnt-ns, need-reschedule, net-ns, |
| 115 | nice, perf-counters, pid, pis-ns, ppid, preemptible, prio, procname, |
| 116 | sgid, suid, tid, uid, user-ns, uts-ns, vegid, veuid, vgid, vpid, vppid, |
| 117 | vsgid, vtid, vuid. |
| 118 | |
| 119 | |
| 120 | * LTTng tracepoint instrumentation |
| 121 | |
| 122 | The LTTng tracer attaches "probes" to kernel subsystems. A probe is a |
| 123 | set of tracepoint callbacks matching the tracepoint instrumentation |
| 124 | for a kernel subsystem. Each probe can be loaded separately. |
| 125 | |
| 126 | Due to limitations in the kernel TRACE_EVENT macros, LTTng |
| 127 | implements its own LTTNG_TRACEPOINT_EVENT macros. It uses the |
| 128 | upstream kernel TRACE_EVENT macros only to validate the prototype |
| 129 | of its callbacks. Also, LTTng exposes an event field semantic which |
| 130 | matches what is exposed to user-space through /proc in the traces, |
| 131 | which requires different field layout implementation than what the |
| 132 | upstream kernel exposes to user-space. |
| 133 | |
| 134 | Files: |
| 135 | src/lttng-tracepoint.c: Mapping between tracepoint instrumentation and LTTng |
| 136 | events. |
| 137 | src/lttng-probes.c: LTTng probes registry. |
| 138 | include/instrumentation/events/*: LTTng tracepoint instrumentation |
| 139 | headers for all kernel subsystems. |
| 140 | |
| 141 | |
| 142 | * LTTng system call instrumentation |
| 143 | |
| 144 | The LTTng tracer gathers both input and output arguments from each |
| 145 | system call, for all supported architectures. This means the system |
| 146 | call probe callbacks read from user-space memory when needed. |
| 147 | |
| 148 | Files: |
| 149 | - src/lttng-syscalls.c: LTTng system call instrumentation callbacks and |
| 150 | tables. |
| 151 | - include/instrumentation/syscall/*: generated and override system |
| 152 | call instrumentation headers. |
| 153 | |
| 154 | |
| 155 | * LTTng statedump |
| 156 | |
| 157 | Dump kernel state at trace start or when an explicit "statedump" is |
| 158 | requested. Useful to reconstruct the entire kernel state at |
| 159 | post-processing. Dumps: threads scheduling state, file |
| 160 | descriptor tables, interrupt handlers, network interfaces, block |
| 161 | devices, cpu topology. Also performs a "fence" on all CPUs to reach |
| 162 | a quiescent state on all CPUs before start and end of statedump. |
| 163 | |
| 164 | Files: |
| 165 | - src/lttng-statedump-impl.c |
| 166 | |
| 167 | |
| 168 | * LTTng tracker |
| 169 | |
| 170 | User ID and Process ID trackers, for filtering of entire sessions |
| 171 | based on UID, GID, and PID. |
| 172 | |
| 173 | Files: |
| 174 | - src/lttng-tracker-id.c |
| 175 | |
| 176 | |
| 177 | * LTTng clock |
| 178 | |
| 179 | Clock plugin registration. The clock used by the LTTng modules kernel |
| 180 | tracer can be overridden by a plugin module. |
| 181 | |
| 182 | Files: |
| 183 | - src/lttng-clock.c |
| 184 | - include/lttng/clock.h |