From 584db1461022b2a0092ebecc1c9b0c54d73dae9d Mon Sep 17 00:00:00 2001 From: dagenais Date: Fri, 4 Jul 2003 14:52:52 +0000 Subject: [PATCH] git-svn-id: http://ltt.polymtl.ca/svn@100 04897980-b3bd-0310-b5e0-8ef037075253 --- ltt/branches/poly/doc/developer/format.html | 377 ++++++++++++++++ .../poly/doc/developer/ltt-to-do.html | 204 +++++++++ ltt/branches/poly/doc/developer/lttv.html | 417 ++++++++++++++++++ 3 files changed, 998 insertions(+) create mode 100644 ltt/branches/poly/doc/developer/format.html create mode 100644 ltt/branches/poly/doc/developer/ltt-to-do.html create mode 100644 ltt/branches/poly/doc/developer/lttv.html diff --git a/ltt/branches/poly/doc/developer/format.html b/ltt/branches/poly/doc/developer/format.html new file mode 100644 index 00000000..7bb1a123 --- /dev/null +++ b/ltt/branches/poly/doc/developer/format.html @@ -0,0 +1,377 @@ + + + + The new LTT trace format + + + +

The new LTT trace format

+ +

+A trace is contained in a directory tree. To send a trace remotely, +the directory tree may be tar-gzipped. Trace foo, placed in the home +directory of user john, /home/john, would have the following content: + +


+$ cd /home/john
+$ tree foo
+foo/
+|-- eventdefs
+|   |-- core.xml
+|   |-- net.xml
+|   |-- ipv4.xml
+|   `-- ide.xml
+|-- info
+|   |-- bookmarks.xml
+|   `-- system.xml
+|-- control
+|   |-- facilities
+|   |-- interrupts
+|   `-- processes
+`-- cpu
+    |-- 0
+    |-- 1
+    |-- 2
+    `-- 3
+
+ +

+The eventdefs directory contains the events descriptions for all the +facilities used. The syntax is a simple subset of XML; XML is widely +known and easily parsed or hand edited. Each file contains one or more +... elements. Indeed, several +facilities may have the same name but different content (and thus will +generate a different checksum), typically when the event descriptions +for a given facility change from one version to the next, if a module +is recompiled and reloaded during a trace. + +

+A small number of events are predefined, part of the "builtin" facility, +and are not present there. These "builtin" events include "facility_load", +"block_start", "block_end" and "time_heartbeat". + +

+The cpu directory contains a tracefile for each cpu, numbered from 0, +in .trace format. A uniprocessor thus only contains the file cpu/0. +A multi-processor with some unused (possibly hotplug) CPU slots may have some +unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly +installed may produce tracefiles named 0, 1, 2, 4, 6, 7. + +

+The files in the control directory also follow the .trace format. +The "facilities" file only contains "builtin" facility_load events +and is used to determine the facilities used and the code range assigned +to each facility. The other control files contain the initial system +state and various subsequent important events, for example process +creations and exit. The interest of placing such subsequent events +in control trace files instead of (or in addition to) in the per cpu +trace files is that they may be accessed more quickly/conveniently +and that they may be kept even when the per cpu files are overwritten +in "flight recorder mode". + +

+The info directory contains in system.xml a description of the system on which +the trace was created as well as different user annotations in bookmark.xml. +This directory may also contain various information about the trace, generated +during trace analysis (statistics, index...). + + +

Trace format

+ +

+Each tracefile is divided into equal size blocks with an uint32 at the block +end giving the offset to the last event in the block. Events are packed +sequentially in the block starting at offset 0 with a "block_start" event +and ending, at the offset stored in the last 4 bytes of the block, with a +block_end event. Both the block_start and block_end events +contain the kernel timestamp (timespec binary structure, +uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles), +and the buffer id (uint64). + +

+Each event consists in an event type id (uint16 which is the event type id +within the facility + the facility base id), a time delta (uint32 in cycles +or nanoseconds, depending on configuration, since the last time value, in the +block header or in a "time_heartbeat" event) and the event type specific data. +All values are packed in native byte order binary format. + + +

System description

+ +

+The system type description, in system.xml, looks like: + +


+<system 
+ node_name="vaucluse"
+ domainname="polymtl.ca" 
+ cpu=4
+ arch_size="ILP32" 
+ endian="little" 
+ kernel_name="Linux" 
+ kernel_release="2.4.18-686-smp" 
+ kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
+ machine="i686" 
+ processor="unknown" 
+ hardware_platform="unknown"
+ operating_system="Linux" 
+ ltt_major_version="2"
+ ltt_minor_version="0"
+ ltt_block_size="100000"
+>
+Some comments about the system
+</system>
+
+ +

+The system attributes kernel_name, node_name, kernel_release, + kernel_version, machine, processor, hardware_platform and operating_system +come from the uname(1) program. The domainname attribute is obtained from +the "hostname --domain" command. The arch_size attribute is one of +LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I), +long (L) and pointers (P). The endian attribute is "little" or "big". +While the arch_size and endian attributes could be deduced from the platform +type, having these explicit allows analysing traces from yet unknown +platforms. The cpu attribute specifies the maximum number of processors in +the system; only tracefiles 0 to this maximum - 1 may exist in the cpu +directory. + +

+Within the system element, the text enclosed may describe further the +system traced. + + +

Event type descriptions

+ +

+A facility contains the descriptions of several event types. When a structure +is reused in several event types, a named type is defined and may be referenced +by several other event types or named types. + +


+<facility name=facility_name>
+  <description>Some text</description>
+  <event name=eventtype_name>
+    <description>Some text</description>
+    --type structure--
+  </event>
+  ...
+  <type name=type_name>
+    --type structure--
+  </type>
+</facility>
+
+ +

+The type structure may be one of the following primitive type elements. +Whenever the keyword isize is used, the allowed values are +short, medium, long, 1, 2, 4, 8, indicating the size in bytes. +The fsize keyword represents one of medium, long, 4 and 8 bytes. + +


+<int size=isize format="printf format"/>
+
+<uint size=isize format="printf format"/>
+
+<float size=fsize format="printf format"/>
+
+<string format="printf format"/>
+
+<enum size=isize format="printf format">label1 label2 ...</enum>
+
+ +

+The string is null terminated. For the enumeration, the size of the integer +used for its representation is specified. + +

+The type structure may also be a compound type. + +


+<array size=n> --type structure-- </array>
+
+<sequence lengthsize=isize> --type structure-- </sequence>
+
+<struct>
+  <field name=field_name>
+    <description>Some text</description>
+    --type structure--
+  </field>
+  ...
+</struct>
+
+<union typecodesize=isize>
+  <field name=field_name>
+    <description>Some text</description>
+    --type structure--
+  </field>
+  ...
+</union>
+
+ +

+Array is a fixed size array of length size. Sequence is a variable size +array with its length stored as a prepended uint of length lengthsize. +A structure is simply an aggregation of fields. An union is one of its n +fields (variant record), as indicated by a preceeding code (0 to n - 1) +of the specified size typecodesize. + +

+Finally the type structure may be defined by referencing a named type. + +


+<typeref name=type_name/>
+
+ +

Builtin events

+ +

+The facility named "builtin" is always present and contains at least the +following event types. + +


+<event name=facility_load>
+  <description>Facility used in the trace</description>
+  <struct>
+    <field name="name"><string/></field>
+    <field name="checksum"><uint size=4/></field>
+    <field name="base_code"><uint size=4/></field>
+  </struct>
+</event>
+
+<event name=block_start>
+  <description>Block start timestamp</description>
+  <typeref name=block_timestamp/>
+</event>
+
+<event name=block_end>
+  <description>Block end timestamp</description>
+  <typeref name=block_timestamp/>
+</event>
+
+<event name=time_heartbeat>
+  <description>System time values sent periodically to minimize cycle counter 
+    drift with respect to real time clock and to detect cycle counter
+    rollovers
+  </description>
+  <typeref name=timestamp/>
+</event>
+
+<type name=block_timestamp>
+  <struct>
+    <field name=timestamp><typeref name=timestamp></field>
+    <field name=block_id><uint size=4/></field>
+  </struct>
+</type>
+
+<type name=timestamp>
+  <struct>
+    <field name=time><typeref name=timespec/></event>
+    <field name="cycle_count"><uint size=8/></field>
+  </struct>
+</event>
+
+<type name=timespec>
+  <struct>
+    <field name="seconds"><uint size=4/></field>
+    <field name="nanoseconds"><uint size=4/></field>
+  </struct>
+</type>
+
+ +

Control files

+ +

+The interrupts file reflects the content of the /proc/interrupts system file. +It contains one event describing each interrupt. At trace start, events are +generated describing all the current interrupts. If the assignment of +interrupts changes later, due to devices or device drivers being activated or +deactivated, additional events may be added to the file. Each interrupt +event has the following structure. + +


+<event name=interrupt>
+  <description>Interrupt request number assignment<description>
+  <struct>
+    <field name="number"><uint size=4/></field>
+    <field name="count"><uint size=4/></field>
+    <field name="controller"><string/></field>
+    <field name="name"><string/></field>
+  </struct>
+</event>
+
+ +

+The processes file contains the list of processes already created when the +trace starts. Each process describing event is modeled after the +/proc/self/status system file. The number of fields in this event is +expected to be expanded in the future to include groups, signal masks, +opened file descriptors and address maps. + +


+<event name=process>
+  <description>Existing process<description>
+  <struct>
+    <field name="name"><string/></field>
+    <field name="pid"><uint size=4/></field>
+    <field name="ppid"><uint size=4/></field>
+    <field name="tracer_pid"><uint size=4/></field>
+    <field name="uid"><uint size=4/></field>
+    <field name="euid"><uint size=4/></field>
+    <field name="suid"><uint size=4/></field>
+    <field name="fsuid"><uint size=4/></field>
+    <field name="gid"><uint size=4/></field>
+    <field name="egid"><uint size=4/></field>
+    <field name="sgid"><uint size=4/></field>
+    <field name="fsgid"><uint size=4/></field>
+    <field name="state"><enum size=4>
+        Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
+    </enum></field>
+  </struct>
+</event>
+
+ +

Facilities

+ +

+Facilities define a granularity of events grouping for filtering, activation +and compilation. Each facility does cost a table entry in the kernel (name, +checksum, event type code range), or somewhere between 20 and 30 bytes. Having +one facility per tracing statement in the kernel would be too much (assuming +that they eventually are routinely inserted in the kernel code and replace +the 80000+ printk statements in some proportion). However, having a few +facilities, up to a few tens, would make sense. + +

+The "builtin" facility contains a small number of predefined events which must +always exist. The "core" facility contains a small subset of OS events which +are almost always of interest (scheduling, interrupts, faults, system calls). +Then, specialized facilities may exist for each subsystem (network, disks, +USB, SCSI...). + + +

Bookmarks

+ +

+Bookmarks are user supplied information added to a trace. They contain user +annotations attached to a time interval. + +


+<bookmarks>
+  <location name=name cpu=n start_time=t end_time=t>Some text</location>
+  ...
+</bookmarks>
+
+ +

+The interval is defined using either "time=" or "start_time=" and +"end_time=", or "cycle=" or "start_cycle=" and "end_cycle=". +The time is in seconds with decimals up to nanoseconds and cycle counts +are unsigned integers with a 64 bits range. The cpu attribute is optional. + + + + + + + diff --git a/ltt/branches/poly/doc/developer/ltt-to-do.html b/ltt/branches/poly/doc/developer/ltt-to-do.html new file mode 100644 index 00000000..0fda1170 --- /dev/null +++ b/ltt/branches/poly/doc/developer/ltt-to-do.html @@ -0,0 +1,204 @@ + + + + Linux Trace Toolkit Status + + + +

Linux Trace Toolkit Status

+ +

Last updated July 1, 2003.

+ +

During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable + features for LTT was collected by Richard Moore. Since then, a lot of infrastructure + work on LTT has been taking place. This status report aims to track current + development efforts and the current status of the various features. This +status page is most certainly incomplete, please send +any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)

+ +

As of this writing, the most active LTT contributors include Karim Yaghmour, +author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski, +Richard J Moore and others from IBM, mainly at the Linux Technology Center, +XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais, +from the department of Computer Engineering at Ecole Polytechnique de +Montreal, and Frank Rowand, from Monte Vista.

+ +

Work recently performed

+ +

Lockless per cpu buffers: Tom Zanussi of IBM has implemented per CPU lockless buffering, with low +overhead very fine grained timestamping, and has updated accordingly the +kernel patch and the trace visualizer except for viewing multiple per CPU +traces simultaneously.

+ +

RelayFS: Tom Zanussi has implemented RelayFS, a separate, simple +and efficient component for moving data between the kernel and user space +applications. This component is reusable by other projects (printk, evlog, +lustre...) and removes a sizeable chunk from the current LTT, making each +piece (relayfs and relayfs-based LTT) simpler, more modular and possibly +more palatable for inclusion in the standard Linux kernel. Besides LTT on +RelayFS, He has implemented printk over RelayFS with an automatically +resizeable printk buffer.

+ +

New trace format: Karim Yaghmour and Michel Dagenais, with input +from several LTT contributors, have designed a new trace format to accomodate +per buffer tracefiles and dynamically defined event types. The new format +includes both the binary trace format and the event type description format. +XiangXiu Yang has developed a simple parser for the event type description +format. This parser is used to generate the tracing macros in the kernel +(genevent) and to support reading tracefiles in the trace reading library +(libltt). + +

Ongoing work

+ +

Libltt: XiangXiu Yang is finishing up an event reading library +and API which parses event descriptions and accordingly reads traces and +decodes events.

+ +

lttv: XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are +remodeling the trace visualizer to use the new trace format and libltt API, +and to allow compiled and scripted plugins, which can dynamically +add new custom trace analysis functions.

+ +

Planned work

+ +

LTT already interfaces with Dynamic Probes. This feature will need to +be updated for the new LTT version.

+ +

The Kernel Crash Dump utilities is another very interesting complementary + project. Interfacing it with RelayFS will help implement useful +flight-recorder like tracing for post-mortem analysis.

+ +

User level tracing is available in the current LTT version but requires +one system call per event. With the new RelayFS based infrastructure, it +would be interesting to use a shared memory buffer directly accessible from +user space. Having one RelayFS channel per user would allow an extremely +efficient, yet secure, user level tracing mechanism.

+ +

Sending important events (process creation, event types/facilities +definitions) to a separate channel could be used to browse traces +interactively more efficiently. Only this concise trace of important +events would need to be processed in its entirety, other larger +gigabyte size traces could be used in random access without requiring +a first preprocessing pass. A separate channel would also be required +in case of incomplete traces such as when tracing to a circular buffer +in "flight recorder" mode; the important events would all be kept +while only the last buffers of ordinary events would be kept.

+ +

Once the visualizer is able to read and display several traces, it + will be interesting to produce side by side synchronized views + (events from two interacting machines A and B one above the other) + or even merged views (combined events from several CPUs in a single + merged graph). Time differences between interacting systems will + need to be estimated and somewhat compensated for.

+ +

LTT currently writes a proc file at trace start time. This + file only contains minimal information about processes and + interrupts names. More information would be desirable for several + applications (process maps, opened descriptors, content of buffer + cache). Furthermore, this information may be more conveniently + gathered from within the kernel and simply written to the trace as + events at start time.

+ +

New features already implemented since LTT 0.9.5

+ +
    +
  1. Per-CPU Buffering scheme.
  2. +
  3. Logging without locking.
  4. +
  5. Minimal latency - minimal or no serialisation. (Lockless tracing +using read_cycle_counter instead of gettimeofday.)
  6. +
  7. Fine granularity time stamping - min=o(CPU cycle time), +max=.05 Gb Ethernet interrupt rate. (Cycle counter being used).
  8. +
  9. Random access to trace event stream. (Random access reading +of events in the trace is already available in LibLTT. However, one first +pass is required through the trace to find all the process creation events; +the cost of this first pass may be reduced in the future if process creation + events are sent to a separate much smaller trace.)
  10. + +
+ +

Features being worked on

+ +
    +
  1. Simple wrapper macros for trace instrumentation. (GenEvent) +
  2. +
  3. Easily expandable with new trace types. (GenEvent)
  4. +
  5. Multiple buffering schemes - switchable globally or selectable +by trace client. (Will be simpler to obtain with RelayFS.)
  6. +
  7. Global buffer scheme. (Will be simpler to obtain with RelayFS.) +
  8. +
  9. Per-process buffer scheme. (Will be simpler to obtain with RelayFS.) +
  10. +
  11. Per-NGPT thread buffer scheme. (Will be simpler to obtain with + RelayFS.)
  12. +
  13. Per-component buffer scheme. (Will be simpler to obtain with +RelayFS.)
  14. +
  15. A set of extensible and modular performance analysis post-processing +programs. (Lttv)
  16. +
  17. Filtering and selection mechanisms within formatting utility. (Lttv) +
  18. +
  19. Variable size event records. (GenEvent, LibEvent, Lttv) +
  20. +
  21. Data reduction facilities able to logically combine traces from + more than one system. (LibEvent, Lttv)
  22. +
  23. Data presentation utilities to be able to present data from multiple + trace instances in a logically combined form (LibEvent, Lttv) +
  24. +
  25. Major/minor code means of identification/registration/assignment. + (GenEvent)
  26. +
  27. A flexible formatting mechanism that will cater for structures +and arrays of structures with recursion. (GenEvent)
  28. + +
+ +

Features already planned for

+ +
    +
  1. Init-time tracing. (To be part of RelayFS.)
  2. +
  3. Updated interface for Dynamic Probes. (As soon as things stabilize.) +
  4. +
  5. Support "flight recorder" always on tracing with minimal resource +consumption. (To be part of RelayFS and interfaced to the Kernel crash +dump facilities.)
  6. +
  7. Fine grained dynamic trace instrumentation for kernel space and +user subsystems. (Dynamic Probes, more efficient user level tracing.)
  8. +
  9. System information logged at trace start. (New special events +to add.)
  10. +
  11. Collection of process memory map information at trace start/restart + and updates of that information at fork/exec/exit. This allows address-to-name + resolution for user space.
  12. +
  13. Include the facility to write system snapshots (total memory layout + for kernel, drivers, and all processes) to a file. This is required for + trace post-processing on a system other than the one producing the trace. + Perhaps some of this is already implemented in the Kernel Crash Dump.
  14. +
  15. Even more efficient tracing from user space.
  16. +
  17. Better integration with tools to define static trace hooks.
  18. +
  19. Better integration with tools to dynamically activate tracing statements.
  20. + +
+ +

Features not currently planned

+ +
    +
  1. POSIX Tracing API compliance.
  2. +
  3. Ability to do function entry/exit tracing facility. (Probably + a totally orthogonal mechanism using either Dynamic Probes hooks or static + code instrumentation using the suitable GCC options for basic blocks instrumentation.)
  4. +
  5. Processor performance counter (which most modern CPUs have) sampling +and recording. (These counters can be read and their value sent in traced +events. Some support to collect these automatically at specific state change +times and to visualize the results would be nice.)
  6. +
  7. Suspend & Resume capability. (Why not simply stop the + trace and start a new one later, otherwise important information like process +creations while suspended must be obtained in some other way.)
  8. +
  9. Per-packet send/receive event. (New event types will be easily +added as needed.)
  10. + +
+
+
+ + + + + + diff --git a/ltt/branches/poly/doc/developer/lttv.html b/ltt/branches/poly/doc/developer/lttv.html new file mode 100644 index 00000000..3dc192dd --- /dev/null +++ b/ltt/branches/poly/doc/developer/lttv.html @@ -0,0 +1,417 @@ + + + + Linux Trace Toolkit User tools + + + +

Linux Trace Toolkit User tools

+ +

The Linux Trace Toolkit Visualizer, lttv, is a modular and extensible +tool to read, analyze, annotate and display traces. It accesses traces through +the libltt API and produces either textual output or graphical output using +the GTK library. This document describes the architecture of lttv for +developers. + +

Lttv is a small executable which links to the trace reading API, libltt, +and to the glib and gobject base libraries. +By itself it contains just enough code to +convert a trace to a textual format and to load modules. +The public +functions defined in the main program are available to all modules. +A number of +text modules may be dynamically loaded to extend the capabilities of +lttv, for instance to compute and print various statistics. + +

A more elaborate module, traceView, dynamically links to the GTK library +and to a support library, libgtklttv. When loaded, it displays graphical +windows in which one or more viewers in subwindows may be used to browse +details of events in traces. A number of other graphical modules may be +dynamically loaded to offer a choice of different viewers (e.g., process, +CPU or block devices state versus time). + +

Main program: main.c

+ +

The main program parses the command line options, loads the requested +modules and executes the hooks registered in the global attributes +(/hooks/main/before, /hooks/main/core, /hooks/main/after). + +

Hooks for callbacks: hook.h (hook.c)

+ +

In a modular extensible application, each module registers callbacks to +insure that it gets called at appropriate times (e.g., after command line +options processing, at each event to compute statistics...). Hooks and lists +of hooks are defined for this purpose and are normally stored in the global +attributes under /hooks/*. + +

Browsable data structures: iattribute.h (iattribute.c)

+ +

In several places, functions should operate on data structures for which the +list of members is extensible. For example, the statistics printing +module should not be +modified each time new statistics are added by other modules. +For this purpose, a gobject interface is defined in iattribute.h to +enumerate and access members in a data structure. Even if new modules +define custom data structures for efficiently storing statistics while they +are being computed, they will be generically accessible for the printing +routine as long as they implement the iattribute interface. + +

Extensible data structures: attribute.h (attribute.c)

+ +

To allow each module to add its needed members to important data structures, +for instance new statistics for processes, the LttvAttributes type is +a container for named typed values. Each attribute has a textual key (name) +and an associated typed value. +It is similar to a C data structure except that the +number and type of the members can change dynamically. It may be accessed +either directly or through the iattribute interface. + +

Some members may be LttvAttributes objects, thus forming a tree of +attributes, not unlike hierarchical file systems or registries. This is used +for the global attributes, used to exchange information between modules. +Attributes are also attached to trace sets, traces and contexts to allow +storing arbitrary attributes. + +

Modules: module.h (module.c)

+ +

The benefit of modules is to avoid recompiling the whole application when +adding new functionality. It also helps insuring that only the needed code +is loaded in memory. + +

Modules are loaded explicitly, being on the list of default modules or +requested by a command line option, with g_module_open. The functions in +the module are not directly accessible. +Indeed, direct, compiled in, references to their functions would be dangerous +since they would exist even before (if ever) the module is loaded. +Each module contains a function named init. Its handle is obtained by +the main program using g_module_symbol and is called. +The init function of the module +then calls everything it needs from the main program or from libraries, +typically registering callbacks in hooks lists stored in the global attributes. +No module function other than init is +directly called. Modules cannot see the functions from other modules since +they may or not be loaded at the same time. + +

The modules must see the declarations for the functions +used, from the main program and from libraries, by including the associated +.h files. The list of libraries used must be provided as argument when +a module is linked. This will insure that these libraries get loaded +automatically when that module is loaded. + +

Libraries contain a number of functions available to modules and to the main +program. They are loaded automatically at start time if linked by the main +program or at module load time if linked by that module. Libraries are +useful to contain functions needed by several modules. Indeed, functions +used by a single module could be simply part of that module. + +

A list of loaded modules is maintained. When a module is requested, it +is verified if the module is already loaded. A module may request other modules +at the beginning of its init function. This will insure that these modules +get loaded and initialized before the init function of the current module +proceeds. Circular dependencies are obviously to be avoided as the +initialization order among mutually dependent modules will be arbitrary. + +

Command line options: option.h (option.c)

+ +

Command line options are added as needed by the main program and by modules +as they are loaded. Thus, while options are scanned and acted upon (i.e., +options to load modules), the +list of options to recognize continues to grow. The options module registers +to get called by /hooks/main/before. It offers hooks /hooks/option/before +and /hooks/option/after which are called just before and just after +processing the options. Many modules register in their init function to +be called in /hooks/options/after to verify the options specified and +register further hooks accordingly. + +

Trace Analysis

+ +

The main purpose of the lttv application is to process trace sets, +calling registered hooks for each event in the traces and maintaining +a context (system state, accumulated statistics). + +

Trace Sets: traceSet.h (traceSet.c)

+ +

Trace sets are defined such that several traces can be analyzed together. +Traces may be added and removed as needed to a trace set. +The main program stores a trace set in /trace_set/default. +The content of the trace_set is defined by command line options and it is +used by analysis modules (batch or interactive). + +

Trace Set Analysis: processTrace.h (processTrace.c)

+ +

The function lttv_process_trace_set loops over all the events +in the specified trace set for the specified time interval. Before +Hooks are first +called for the trace set and for each trace and tracefile +(one per cpu plus control tracefiles) in the trace set. +Then hooks are called for +each event in sorted time order. Finally, after hooks are called +for the trace set and for each trace and tracefile in it. + +

To call all the event hooks in sorted time order, a priority queue +(or sorted tree) is used. The first event from each tracefile is read and its +time used as key in the sorted tree. The event with the lowest key is removed +from the tree, the next event from that tracefile is read and reinserted in +the tree. + +

Each hook is called with a LttvContext gobject as call data. The LttvContext +object for the trace set before/after hooks is provided in the call to +lttv_process_trace_set. Shallow copies of this context are made for each +trace in the trace set for the trace before/after hooks. Again, shallow +copies of each trace context are made for each tracefile in a trace. +The context for each tracefile is used both for the tracefile before/after +hooks and when calling the hooks for the contained events. + +

The lttv_process_trace_set function sets appropriately the fields in the +context before calling a hook. For example, when calling a hook event, +the context contains: + +

+
trace_set_context
context for the trace set. +
trace_context
context for the trace. +
ts
trace set. +
t
trace. +
tf
tracefile. +
e
event. +
+ +

The cost of providing all this information in the context is relatively +low. When calling a hook from one event to the next, in the same tracefile, +only the event field needs to be changed. +The contexts used when processing traces are key to extensibility and +performance. New modules may need additional data members in the context to +store intermediate results. For this purpose, it is possible to derive +subtypes of LttvContext in order to add new data members. + + +

Reconstructing the system state from the trace: state.h (state.c)

+ +

The events in a trace often represent state transitions in the traced +system. When the trace is processed, and events accessed in time sorted +order, it is thus possible to reconstruct in part the state of the +traced system: state of each CPU, process, disk queue. The state of each +process may contain detailed information such as opened file descriptors +and memory map if needed by the analysis and if sufficient information is +available in the trace. This incrementally updated state information may be +used to display state graphs, or simply to compute state dependent +statistics (time spent in user or system mode, waiting for a file...). + +

+When tracing starts, at T0, no state is available. The OS state may be +obtained through "initial state" events which enumerate the important OS data +structures. Unless the state is obtained atomically, other events +describing state changes may be interleaved in the trace and must be +processed in the correct order. Once all the special initial state +events are obtained, at Ts, the complete state is available. From there the +system state can be deduced incrementally from the events in the trace. + +

+Analysis tools must be prepared for missing state information. In some cases +only a subset of events is traced, in others the trace may be truncated +in flight recorder mode. + +

+In interactive processing, the interval for which processing is required +varies. After scrolling a viewer, the events in the new interval to display +need to be processed in order to redraw the view. To avoid restarting +the processing at the trace start to reconstruct incrementally the system +state, the computed state may be memorized at regular interval, for example at +each 100 000 events, in a time indexed database associated with a trace. +To conserve space, it may be possible in some cases to only store state +differences. + +

To process a specific time interval, the state at the beginning of the +interval would be obtained by copying the last preceeding saved state +and processing the events since then to update the state. + +

A new subtype of LttvContext, LttvStateContext, is defined to add storage +for the state information. It defines a trace set state as a set of trace +state. The trace state is composed of processes, CPUs and block devices. +Each CPU has a currently executing process and each process state keeps +track the interrupt stack frames (faults, interrupts, +system calls), executable file name and other information such as opened +file descriptors. Each frame stores the process status, entry time +and last status change time. + +

File state.c provides state updating hooks to be called when the trace is +processed. When a scheduling change event is delivered to the hook, for +instance, the current process for the CPU is changed and the state of the +incoming and outgoing processes is changed. +The state updating hooks are stored in the global attributes under +/hooks/state/core/trace_set/before, after, +/hooks/state/core/trace/before, after... +to be used by processing functions requiring state updating (batch and +interactive alalysis, computing the state at time T by updating a preceeding +saved state...). + +

Computing Statistics: stats.h (stats.c)

+ +

This file defines a subtype of LttvStateContext, LttvStatsContext, +to store statistics on various aspects of a trace set. The LttvTraceSetStats +structure contains a set of LttvTraceStats structures. Each such structure +contains structures for CPUs, processes, interrupt types (IRQ, system call, +fault), subtypes (individual system calls, IRQs or faults) and +block devices. The CPUs also contain structures for processes, interrupt types, +subtypes and block devices. Process structures similarly contain +structures for interrupt types, subtypes and block devices. At each level +(trace set, trace, cpu, process, interrupt stack frames) +attributes are used to store statistics. + +

File stats.c provides statistics computing hooks to be called when the +trace is processed. For example, when a write event is processed, +the attribute BytesWritten in the corresponding system, cpu, process, +interrupt type (e.g. system call) and subtype (e.g. write) is incremented +by the number of bytes stored in the event. When the processing is finished, +perhaps in the after hooks, the number of bytes written and other statistics +may be summed over all CPUs for a given process, over all processes for a +given CPU or over all traces. + +

The basic set of statistics computed by stats.c include for the whole + trace set: + +

+ +

The structure to store statistics differs from the state storage structure +in several ways. Statistics are maintained in different ways (per CPU all +processes, per process all CPUs, per process on a given CPU...). Furthermore, +statistics are maintained for all processes which existed during the trace +while the state at time T only stores information about current processes. + +

The hooks defined by stats.c are stored in the global attributes under +/hooks/stats/core/trace_set/before, after, +/hooks/stats/core/trace/before, after to be used by processing functions +interested in statistics. + +

Filtering events: filter.h (filter.c)

+ +

+Filters are used to select which events in a trace are shown in a viewer or are +used in a computation. The filtering rules are based on the values of +events fields. The filter module receives a filter expression and computes +a compiled filter. The compiled filter then serves as hook data for +check event +filter hooks which, given a context containing an event, +return TRUE or FALSE to +indicate if the event satisfies the filter. Trace and tracefile check +filter hooks +may be used to determine if a system and CPU satisfy the filter. Finally, +the filter module has a function to return the time bounds, if any, imposed +by a filter. + +

For some applications, the hooks provided by the filter module may not +be sufficient, since they are based on simple boolean combinations +of comparisons between fields and constants. In that case, custom code may be +used for check hooks during the processing. An example of complex +filtering could be to only show events belonging to processes which consumed +more than 10% of the CPU in the last 10 seconds. + +

In module filter.c, filters are specified using textual expressions +with AND, OR, NOT operations on +nested subexpressions. Primitive expressions compare an event field to +a constant. In the graphical user interface, a filter editor is provided. + +


+tokens: ( ! && || == <= >= > < != name [ ] int float string )
+
+expression = ( expression ) OR ! expression OR
+     expression && expression OR expression || expression OR 
+     simple_expression
+
+simple_expression = field_selector OP value
+
+value = int OR float OR string OR enum
+
+field_selector = component OR component . field_selector
+
+component = name OR name [ int ]
+
+ + +

Batch Analysis: batchAnalysis.h (batchAnalysis.c)

+ +

This module registers to be called by the main program (/hooks/main/core). +When called, it gets the current trace set (/trace_set/default), +state updating hooks (/hooks/state/*) the statistics hooks +(/hooks/stats/*) and other analysis hooks (/hooks/batch/*) +and runs lttv_process_trace_set for the entire +trace set time interval. This simple processing of the complete trace set +is normally sufficient for batch operations such as converting a trace to +text and computing various statistics. + + +

Text output for events and statistics: textDump.h (textDump.c)

+ +

+This module registers hooks (/hooks/batch) +to print a textual representation of each event +(event hooks) and to print the content of the statistics accumulated in the +context (after trace set hook). + +

Trace Set Viewers

+ +

+A library, libgtklttv, is defined to provide utility functions for +the second set of modules, wich compose the interactive graphical user +interface. It offers functions to create and interact with top level trace +viewing windows, and to insert specialized embedded viewer modules. +The libgtklttv library requires the gtk library. +The viewer modules include a detailed event list, eventsTableView, +a process state graph, processStateView, and a CPU state graph, cpuStateView. + +

+The top level gtkTraceSet, defined in libgtklttv, +window has the usual FILE EDIT... menu and a toolbar. +It has an associated trace set (and filter) and contains several tabs, each +containing several vertically stacked time synchronized trace set viewers. +It manages the space allocated to each contained viewer, the menu items and +tools registered by each contained viewer and the current time and current +time interval. + +

+When viewers change the current time or time interval, the gtkTraceSet +window notifies all contained viewers. When one or more viewers need +redrawing, the gtkTraceSet window calls the lttv_process_trace_set +function for the needed time interval, after computing the system state +for the interval start time. While events are processed, drawing hooks +from the viewers are called. + +

+TO COMPLETE; description and motivation for the gtkTraceSet widget structure +and interaction with viewers. Description and motivation for the detailed +event view and process state view. + + + -- 2.34.1