Add copyright notices and some comments about status and TODO
[lttv.git] / ltt / branches / poly / doc / developer / format.html
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 <html>
3 <head>
4 <title>The new LTT trace format</title>
5 </head>
6 <body>
7
8 <h1>The new LTT trace format</h1>
9
10 <P>
11 A trace is contained in a directory tree. To send a trace remotely,
12 the directory tree may be tar-gzipped. Trace foo, placed in the home
13 directory of user john, /home/john, would have the following content:
14
15 <PRE><TT>
16 $ cd /home/john
17 $ tree foo
18 foo/
19 |-- eventdefs
20 | |-- core.xml
21 | |-- net.xml
22 | |-- ipv4.xml
23 | `-- ide.xml
24 |-- info
25 | |-- bookmarks.xml
26 | `-- system.xml
27 |-- control
28 | |-- facilities
29 | |-- interrupts
30 | `-- processes
31 `-- cpu
32 |-- 0
33 |-- 1
34 |-- 2
35 `-- 3
36 </TT></PRE>
37
38 <P>
39 The eventdefs directory contains the events descriptions for all the
40 facilities used. The syntax is a simple subset of XML; XML is widely
41 known and easily parsed or hand edited. Each file contains one or more
42 <FACILITY NAME=name>...</FACILITY> elements. Indeed, several
43 facilities may have the same name but different content (and thus will
44 generate a different checksum). It typically happens when, while tracing
45 is enabled, a module using the named facility is unloaded, modified
46 (along with the description of some events), recompiled and reloaded.
47 Then, the trace will contain events from two different, similarly named,
48 facility versions.
49
50 <P>
51 A small number of events are predefined, part of the "builtin" facility,
52 and are not present there. These "builtin" events include "facility_load",
53 "block_start", "block_end" and "time_heartbeat".
54
55 <P>
56 The cpu directory contains a tracefile for each cpu, numbered from 0,
57 in .trace format. A uniprocessor thus only contains the file cpu/0.
58 A multi-processor with some unused (possibly hotplug) CPU slots may have some
59 unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly
60 installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
61
62 <P>
63 The files in the control directory also follow the .trace format.
64 The "facilities" file only contains "builtin" facility_load events
65 and is used to determine the facilities used and the code range assigned
66 to each facility. The other control files contain the initial system
67 state and various subsequent important events, for example process
68 creations and exit. The interest of placing such subsequent events
69 in control trace files instead of (or in addition to) in the per cpu
70 trace files is that they may be accessed more quickly/conveniently
71 and that they may be kept even when the per cpu files are overwritten
72 in "flight recorder mode".
73
74 <P>
75 The info directory contains in system.xml a description of the system on which
76 the trace was created as well as different user annotations in bookmark.xml.
77 This directory may also contain various information about the trace, generated
78 during trace analysis (statistics, index...).
79
80
81 <H2>Trace format</H2>
82
83 <P>
84 Each tracefile is divided into equal size blocks with an uint32 at the block
85 end giving the offset to the last event in the block. Events are packed
86 sequentially in the block starting at offset 0 with a "block_start" event
87 and ending, at the offset stored in the last 4 bytes of the block, with a
88 block_end event. Both the block_start and block_end events
89 contain the kernel timestamp (timespec binary structure,
90 uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles),
91 and the buffer id (uint64).
92
93 <P>
94 Each event consists in an event type id (uint16 which is the event type id
95 within the facility + the facility base id), a time delta (uint32 in cycles
96 or nanoseconds, depending on configuration, since the last time value, in the
97 block header or in a "time_heartbeat" event) and the event type specific data.
98 All values are packed in native byte order binary format.
99
100
101 <H2>System description</H2>
102
103 <P>
104 The system type description, in system.xml, looks like:
105
106 <PRE><TT>
107 &lt;system
108 node_name="vaucluse"
109 domainname="polymtl.ca"
110 cpu=4
111 arch_size="ILP32"
112 endian="little"
113 kernel_name="Linux"
114 kernel_release="2.4.18-686-smp"
115 kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
116 machine="i686"
117 processor="unknown"
118 hardware_platform="unknown"
119 operating_system="Linux"
120 ltt_major_version="2"
121 ltt_minor_version="0"
122 ltt_block_size="100000"
123 &gt;
124 Some comments about the system
125 &lt;/system&gt;
126 </TT></PRE>
127
128 <P>
129 The system attributes kernel_name, node_name, kernel_release,
130 kernel_version, machine, processor, hardware_platform and operating_system
131 come from the uname(1) program. The domainname attribute is obtained from
132 the "hostname --domain" command. The arch_size attribute is one of
133 LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
134 long (L) and pointers (P). The endian attribute is "little" or "big".
135 While the arch_size and endian attributes could be deduced from the platform
136 type, having these explicit allows analysing traces from yet unknown
137 platforms. The cpu attribute specifies the maximum number of processors in
138 the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
139 directory.
140
141 <P>
142 Within the system element, the text enclosed may describe further the
143 system traced.
144
145
146 <H2>Event type descriptions</H2>
147
148 <P>
149 A facility contains the descriptions of several event types. When a structure
150 is reused in several event types, a named type is defined and may be referenced
151 by several other event types or named types.
152
153 <PRE><TT>
154 &lt;facility name=facility_name&gt;
155 &lt;description&gt;Some text&lt;/description&gt;
156 &lt;event name=eventtype_name&gt;
157 &lt;description&gt;Some text&lt;/description&gt;
158 --type structure--
159 &lt;/event&gt;
160 ...
161 &lt;type name=type_name&gt;
162 --type structure--
163 &lt;/type&gt;
164 &lt;/facility&gt;
165 </TT></PRE>
166
167 <P>
168 The type structure may be one of the following primitive type elements.
169 Whenever the keyword isize is used, the allowed values are
170 short, medium, long, 1, 2, 4, 8, indicating the size in bytes.
171 The fsize keyword represents one of medium, long, 4 and 8 bytes.
172
173 <PRE><TT>
174 &lt;int size=isize format="printf format"/&gt;
175
176 &lt;uint size=isize format="printf format"/&gt;
177
178 &lt;float size=fsize format="printf format"/&gt;
179
180 &lt;string format="printf format"/&gt;
181
182 &lt;enum size=isize format="printf format"&gt;label1 label2 ...&lt;/enum&gt;
183 </TT></PRE>
184
185 <P>
186 The string is null terminated. For the enumeration, the size of the integer
187 used for its representation is specified.
188
189 <P>
190 The type structure may also be a compound type.
191
192 <PRE><TT>
193 &lt;array size=n&gt; --type structure-- &lt;/array&gt;
194
195 &lt;sequence lengthsize=isize&gt; --type structure-- &lt;/sequence&gt;
196
197 &lt;struct&gt;
198 &lt;field name=field_name&gt;
199 &lt;description&gt;Some text&lt;/description&gt;
200 --type structure--
201 &lt;/field&gt;
202 ...
203 &lt;/struct&gt;
204
205 &lt;union typecodesize=isize&gt;
206 &lt;field name=field_name&gt;
207 &lt;description&gt;Some text&lt;/description&gt;
208 --type structure--
209 &lt;/field&gt;
210 ...
211 &lt;/union&gt;
212 </TT></PRE>
213
214 <P>
215 Array is a fixed size array of length size. Sequence is a variable size
216 array with its length stored as a prepended uint of length lengthsize.
217 A structure is simply an aggregation of fields. An union is one of its n
218 fields (variant record), as indicated by a preceeding code (0 to n - 1)
219 of the specified size typecodesize.
220
221 <P>
222 Finally the type structure may be defined by referencing a named type.
223
224 <PRE><TT>
225 &lt;typeref name=type_name/&gt;
226 </PRE></TT>
227
228 <H2>Builtin events</H2>
229
230 <P>
231 The facility named "builtin" is always present and contains at least the
232 following event types.
233
234 <PRE><TT>
235 &lt;event name=facility_load&gt;
236 &lt;description&gt;Facility used in the trace&lt;/description&gt;
237 &lt;struct&gt;
238 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
239 &lt;field name="checksum"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
240 &lt;field name="base_code"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
241 &lt;/struct&gt;
242 &lt;/event&gt;
243
244 &lt;event name=block_start&gt;
245 &lt;description&gt;Block start timestamp&lt;/description&gt;
246 &lt;typeref name=block_timestamp/&gt;
247 &lt;/event&gt;
248
249 &lt;event name=block_end&gt;
250 &lt;description&gt;Block end timestamp&lt;/description&gt;
251 &lt;typeref name=block_timestamp/&gt;
252 &lt;/event&gt;
253
254 &lt;event name=time_heartbeat&gt;
255 &lt;description&gt;System time values sent periodically to minimize cycle counter
256 drift with respect to real time clock and to detect cycle counter
257 rollovers
258 &lt;/description&gt;
259 &lt;typeref name=timestamp/&gt;
260 &lt;/event&gt;
261
262 &lt;type name=block_timestamp&gt;
263 &lt;struct&gt;
264 &lt;field name=timestamp&gt;&lt;typeref name=timestamp&gt;&lt;/field&gt;
265 &lt;field name=block_id&gt;&lt;uint size=4/&gt;&lt;/field&gt;
266 &lt;/struct&gt;
267 &lt;/type&gt;
268
269 &lt;type name=timestamp&gt;
270 &lt;struct&gt;
271 &lt;field name=time&gt;&lt;typeref name=timespec/&gt;&lt;/event&gt;
272 &lt;field name="cycle_count"&gt;&lt;uint size=8/&gt;&lt;/field&gt;
273 &lt;/struct&gt;
274 &lt;/event&gt;
275
276 &lt;type name=timespec&gt;
277 &lt;struct&gt;
278 &lt;field name="seconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
279 &lt;field name="nanoseconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
280 &lt;/struct&gt;
281 &lt;/type&gt;
282 </TT></PRE>
283
284 <H2>Control files</H2>
285
286 <P>
287 The interrupts file reflects the content of the /proc/interrupts system file.
288 It contains one event describing each interrupt. At trace start, events are
289 generated describing all the current interrupts. If the assignment of
290 interrupts changes later, due to devices or device drivers being activated or
291 deactivated, additional events may be added to the file. Each interrupt
292 event has the following structure.
293
294 <PRE><TT>
295 &lt;event name=interrupt&gt;
296 &lt;description&gt;Interrupt request number assignment&lt;description&gt;
297 &lt;struct&gt;
298 &lt;field name="number"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
299 &lt;field name="count"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
300 &lt;field name="controller"&gt;&lt;string/&gt;&lt;/field&gt;
301 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
302 &lt;/struct&gt;
303 &lt;/event&gt;
304 </TT></PRE>
305
306 <P>
307 The processes file contains the list of processes already created when the
308 trace starts. Each process describing event is modeled after the
309 /proc/self/status system file. The number of fields in this event is
310 expected to be expanded in the future to include groups, signal masks,
311 opened file descriptors and address maps.
312
313 <PRE><TT>
314 &lt;event name=process&gt;
315 &lt;description&gt;Existing process&lt;description&gt;
316 &lt;struct&gt;
317 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
318 &lt;field name="pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
319 &lt;field name="ppid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
320 &lt;field name="tracer_pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
321 &lt;field name="uid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
322 &lt;field name="euid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
323 &lt;field name="suid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
324 &lt;field name="fsuid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
325 &lt;field name="gid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
326 &lt;field name="egid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
327 &lt;field name="sgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
328 &lt;field name="fsgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
329 &lt;field name="state"&gt;&lt;enum size=4&gt;
330 Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
331 &lt;/enum&gt;&lt;/field&gt;
332 &lt;/struct&gt;
333 &lt;/event&gt;
334 </TT></PRE>
335
336 <H2>Facilities</H2>
337
338 <P>
339 Facilities define a granularity of events grouping for filtering, activation
340 and compilation. Each facility does cost a table entry in the kernel (name,
341 checksum, event type code range), or somewhere between 20 and 30 bytes. Having
342 one facility per tracing statement in the kernel would be too much (assuming
343 that they eventually are routinely inserted in the kernel code and replace
344 the 80000+ printk statements in some proportion). However, having a few
345 facilities, up to a few tens, would make sense.
346
347 <P>
348 The "builtin" facility contains a small number of predefined events which must
349 always exist. The "core" facility contains a small subset of OS events which
350 are almost always of interest (scheduling, interrupts, faults, system calls).
351 Then, specialized facilities may exist for each subsystem (network, disks,
352 USB, SCSI...).
353
354
355 <H2>Bookmarks</H2>
356
357 <P>
358 Bookmarks are user supplied information added to a trace. They contain user
359 annotations attached to a time interval.
360
361 <PRE><TT>
362 &lt;bookmarks&gt;
363 &lt;location name=name cpu=n start_time=t end_time=t&gt;Some text&lt;/location&gt;
364 ...
365 &lt;/bookmarks&gt;
366 </TT></PRE>
367
368 <P>
369 The interval is defined using either "time=" or "start_time=" and
370 "end_time=", or "cycle=" or "start_cycle=" and "end_cycle=".
371 The time is in seconds with decimals up to nanoseconds and cycle counts
372 are unsigned integers with a 64 bits range. The cpu attribute is optional.
373
374 </BODY>
375 </HTML>
376
377
378
379
This page took 0.041204 seconds and 4 git commands to generate.