Linux Trace Toolkit Status

Last updated July 1, 2003.

During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable features for LTT was collected by Richard Moore. Since then, a lot of infrastructure work on LTT has been taking place. This status report aims to track current development efforts and the current status of the various features. This status page is most certainly incomplete, please send any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)

As of this writing, the most active LTT contributors include Karim Yaghmour, author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski, Richard J Moore and others from IBM, mainly at the Linux Technology Center, XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais, from the department of Computer Engineering at Ecole Polytechnique de Montreal, and Frank Rowand, from Monte Vista.

Work recently performed

Lockless per cpu buffers: Tom Zanussi of IBM has implemented per CPU lockless buffering, with low overhead very fine grained timestamping, and has updated accordingly the kernel patch and the trace visualizer except for viewing multiple per CPU traces simultaneously.

RelayFS: Tom Zanussi has implemented RelayFS, a separate, simple and efficient component for moving data between the kernel and user space applications. This component is reusable by other projects (printk, evlog, lustre...) and removes a sizeable chunk from the current LTT, making each piece (relayfs and relayfs-based LTT) simpler, more modular and possibly more palatable for inclusion in the standard Linux kernel. Besides LTT on RelayFS, He has implemented printk over RelayFS with an automatically resizeable printk buffer.

New trace format: Karim Yaghmour and Michel Dagenais, with input from several LTT contributors, have designed a new trace format to accomodate per buffer tracefiles and dynamically defined event types. The new format includes both the binary trace format and the event type description format. XiangXiu Yang has developed a simple parser for the event type description format. This parser is used to generate the tracing macros in the kernel (genevent) and to support reading tracefiles in the trace reading library (libltt).

Ongoing work

Libltt: XiangXiu Yang is finishing up an event reading library and API which parses event descriptions and accordingly reads traces and decodes events.

lttv: XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are remodeling the trace visualizer to use the new trace format and libltt API, and to allow compiled and scripted plugins, which can dynamically add new custom trace analysis functions.

Planned work

LTT already interfaces with Dynamic Probes. This feature will need to be updated for the new LTT version.

The Kernel Crash Dump utilities is another very interesting complementary project. Interfacing it with RelayFS will help implement useful flight-recorder like tracing for post-mortem analysis.

User level tracing is available in the current LTT version but requires one system call per event. With the new RelayFS based infrastructure, it would be interesting to use a shared memory buffer directly accessible from user space. Having one RelayFS channel per user would allow an extremely efficient, yet secure, user level tracing mechanism.

Sending important events (process creation, event types/facilities definitions) to a separate channel could be used to browse traces interactively more efficiently. Only this concise trace of important events would need to be processed in its entirety, other larger gigabyte size traces could be used in random access without requiring a first preprocessing pass. A separate channel would also be required in case of incomplete traces such as when tracing to a circular buffer in "flight recorder" mode; the important events would all be kept while only the last buffers of ordinary events would be kept.

Once the visualizer is able to read and display several traces, it will be interesting to produce side by side synchronized views (events from two interacting machines A and B one above the other) or even merged views (combined events from several CPUs in a single merged graph). Time differences between interacting systems will need to be estimated and somewhat compensated for.

LTT currently writes a proc file at trace start time. This file only contains minimal information about processes and interrupts names. More information would be desirable for several applications (process maps, opened descriptors, content of buffer cache). Furthermore, this information may be more conveniently gathered from within the kernel and simply written to the trace as events at start time.

New features already implemented since LTT 0.9.5

  1. Per-CPU Buffering scheme.
  2. Logging without locking.
  3. Minimal latency - minimal or no serialisation. (Lockless tracing using read_cycle_counter instead of gettimeofday.)
  4. Fine granularity time stamping - min=o(CPU cycle time), max=.05 Gb Ethernet interrupt rate. (Cycle counter being used).
  5. Random access to trace event stream. (Random access reading of events in the trace is already available in LibLTT. However, one first pass is required through the trace to find all the process creation events; the cost of this first pass may be reduced in the future if process creation events are sent to a separate much smaller trace.)

Features being worked on

  1. Simple wrapper macros for trace instrumentation. (GenEvent)
  2. Easily expandable with new trace types. (GenEvent)
  3. Multiple buffering schemes - switchable globally or selectable by trace client. (Will be simpler to obtain with RelayFS.)
  4. Global buffer scheme. (Will be simpler to obtain with RelayFS.)
  5. Per-process buffer scheme. (Will be simpler to obtain with RelayFS.)
  6. Per-NGPT thread buffer scheme. (Will be simpler to obtain with RelayFS.)
  7. Per-component buffer scheme. (Will be simpler to obtain with RelayFS.)
  8. A set of extensible and modular performance analysis post-processing programs. (Lttv)
  9. Filtering and selection mechanisms within formatting utility. (Lttv)
  10. Variable size event records. (GenEvent, LibEvent, Lttv)
  11. Data reduction facilities able to logically combine traces from more than one system. (LibEvent, Lttv)
  12. Data presentation utilities to be able to present data from multiple trace instances in a logically combined form (LibEvent, Lttv)
  13. Major/minor code means of identification/registration/assignment. (GenEvent)
  14. A flexible formatting mechanism that will cater for structures and arrays of structures with recursion. (GenEvent)

Features already planned for

  1. Init-time tracing. (To be part of RelayFS.)
  2. Updated interface for Dynamic Probes. (As soon as things stabilize.)
  3. Support "flight recorder" always on tracing with minimal resource consumption. (To be part of RelayFS and interfaced to the Kernel crash dump facilities.)
  4. Fine grained dynamic trace instrumentation for kernel space and user subsystems. (Dynamic Probes, more efficient user level tracing.)
  5. System information logged at trace start. (New special events to add.)
  6. Collection of process memory map information at trace start/restart and updates of that information at fork/exec/exit. This allows address-to-name resolution for user space.
  7. Include the facility to write system snapshots (total memory layout for kernel, drivers, and all processes) to a file. This is required for trace post-processing on a system other than the one producing the trace. Perhaps some of this is already implemented in the Kernel Crash Dump.
  8. Even more efficient tracing from user space.
  9. Better integration with tools to define static trace hooks.
  10. Better integration with tools to dynamically activate tracing statements.

Features not currently planned

  1. POSIX Tracing API compliance.
  2. Ability to do function entry/exit tracing facility. (Probably a totally orthogonal mechanism using either Dynamic Probes hooks or static code instrumentation using the suitable GCC options for basic blocks instrumentation.)
  3. Processor performance counter (which most modern CPUs have) sampling and recording. (These counters can be read and their value sent in traced events. Some support to collect these automatically at specific state change times and to visualize the results would be nice.)
  4. Suspend & Resume capability. (Why not simply stop the trace and start a new one later, otherwise important information like process creations while suspended must be obtained in some other way.)
  5. Per-packet send/receive event. (New event types will be easily added as needed.)