lttv/lttv/sync/README

   1 Benjamin Poirier
   2 benjamin.poirier@polymtl.ca
   3 2009
   4
   5 + About time synchronization
   6 This framework performs offline time synchronization. This means that the
   7 synchronization is done after tracing is over. It is not the same as online
   8 synchronization like what is done by NTP. Nor is it directly influenced by it.
   9
  10 Event timestamps are adjusted according to a clock correction function that
  11 palliates for initial offset and rate offset (ie. clocks that don't start out
  12 at the same value and clocks that don't run at the same speed). It can work on
  13 two or more traces.
  14
  15 The synchronization is based on relations identified in network traffic
  16 between nodes. So, for it to work, there must be traffic exchanged between the
  17 nodes. At the moment, this must be TCP traffic. Any kind will do (ssh, http,
  18 ...)
  19
  20 For scientific information about the algorithms used, see:
  21 * Duda, A., Harrus, G., Haddad, Y., and Bernard, G.: Estimating global time in
  22 distributed systems, Proc. 7th Int. Conf. on Distributed Computing Systems,
  23 Berlin, volume 18, 1987
  24 * Ashton, P.: Algorithms for Off-line Clock Synchronisation, University of
  25 Canterbury, December 1995
  26 http://www.cosc.canterbury.ac.nz/research/reports/TechReps/1995/tr_9512.pdf
  27
  28 + Using time synchronization
  29 ++ Recording traces
  30 To use time synchronization you have to record traces on multiple nodes
  31 simultaneously with lttng (the tracer). While recording the traces, you have
  32 to make sure the following markers are enabled:
  33 * dev_receive
  34 * dev_xmit_extended
  35 * tcpv4_rcv_extended
  36 * udpv4_rcv_extended
  37 You can use the 'ltt-armall' and 'ltt-armnetsync' scripts for this.
  38
  39 You also have to make sure there is some TCP traffic between the traced nodes.
  40
  41 ++ Viewing traces
  42 Afterwards, you have to make sure all the traces are accessible from a single
  43 machine, where lttv (the viewer) is run.
  44
  45 Time synchronization is enabled and controlled via the following lttv options,
  46 as seen with "-h":
  47 --sync
  48                      synchronize the time between the traces
  49 --sync-stats
  50                      print statistics about the time synchronization
  51 --sync-null
  52                                          read the events but do not perform any processing, this
  53                                          is mostly for performance evaluation
  54 --sync-analysis  -  argument: chull, linreg
  55                      specify the algorithm to use for event analysis
  56 --sync-graphs
  57                      output gnuplot graph showing synchronization points
  58 --sync-graphs-dir  -  argument: DIRECTORY
  59                      specify the directory where to store the graphs, by
  60                                          default in "graphs-<lttv-pid>"
  61
  62 To enable synchronization, start lttv with the "--sync" option. It can be
  63 used in text mode or in GUI mode. You can add the traces one by one in the GUI
  64 but this will recompute the synchronization after every trace that is added.
  65 Instead, you can save some time by specifying all your traces on the command
  66 line (using -t).
  67
  68 Example:
  69 lttv-gui -t traces/node1 -t traces/node2 --sync
  70
  71 ++ Statistics
  72 The --sync-stats option is useful to make sure the synchronization algorithms
  73 worked. Here is an example output (with added comments) from a successful
  74 chull (one of the synchronization algorithms) run of two traces:
  75         LTTV processing stats:
  76                 received frames: 452
  77                 received frames that are IP: 452
  78                 received and processed packets that are TCP: 268
  79                 sent packets that are TCP: 275
  80         TCP matching stats:
  81                 total input and output events matched together to form a packet: 240
  82                 Message traffic:
  83                           0 - 1  : sent 60         received 60
  84 # Note that 60 + 60 < 240, this is because there was loopback traffic, which is
  85 # discarded.
  86         Convex hull analysis stats:
  87                 out of order packets dropped from analysis: 0
  88                 Number of points in convex hulls:
  89                           0 - 1  : lower half-hull 7     upper half-hull 9
  90                 Individual synchronization factors:
  91                           0 - 1  : Middle     a0= -1.33641e+08 a1= 1 - 4.5276e-08 accuracy 1.35355e-05
  92                                                                   a0: -1.34095e+08 to -1.33187e+08 (delta=  907388)
  93                                                                   a1: 1 -6.81298e-06 to +6.72248e-06 (delta= 1.35355e-05)
  94         Resulting synchronization factors:
  95                 trace 0 drift= 1 offset= 0 (0.000000) start time= 18.799023588
  96                 trace 1 drift= 1 offset= 1.33641e+08 (0.066818) start time= 19.090688494
  97         Synchronization time:
  98                 real time: 0.113308
  99                 user time: 0.112007
 100                 system time: 0.000000
 101
 102 ++ Algorithms
 103 The synchronization framework is extensible and already includes two
 104 algorithms: chull and linreg. You can choose which analysis algorithm to use
 105 with the --sync-analysis option.
 106
 107 + Design
 108 This part describes the design of the synchronization framework. This is to
 109 help programmers interested in:
 110 * adding new synchronization algorithms (analysis part)
 111         There are already two analysis algorithms available: chull and linreg
 112 * using new types of events (processing and matching parts)
 113 * using time synchronization with another data source/tracer (processing part)
 114         There are already two data sources available: lttng and unittest
 115
 116 ++ Sync chain
 117 This part is specific to the framework in use: the program doing
 118 synchronization, the executable linking to the event_*.o
 119 eg. LTTV, unittest
 120
 121 This reads parameters, creates SyncState and calls the processing init
 122 function. The "sync chain" is the set of event-* modules. At the moment there
 123 is only one module at each stage. However, as more module are added, it will
 124 become relevant to have many modules at the same stage simultaneously. This
 125 will require some modifications. I've kept this possibility at the back of my
 126 mind while designing.
 127
 128 ++ Stage 1: Event processing
 129 Specific to the tracing data source.
 130 eg. LTTng, LTT userspace, libpcap
 131
 132 Read the events from the trace and stuff them in an appropriate Event object.
 133
 134 ++ Communication between stages 1 and 2: events
 135 Communication is done via objects specialized from Event. At the moment, all
 136 *Event are in data_structures.h. Specific event structures and functions could
 137 be in separate files. This way, adding a new set of modules would require
 138 shipping extra data_structures* files instead of modifying the existing one.
 139 For this to work, Event.type couldn't be an enum, it could be an int and use
 140 #defines or constants defined in the specialized data_structures* files.
 141 Event.event could be a void*.
 142
 143 ++ Stage 2: Event matching
 144 This stage and its modules are specific to the type of event. Event processing
 145 feeds the events one at a time but event analysis works on groups of events.
 146 Event matching is responsible for forming these groups. Generally speaking,
 147 these can have different types of relation ("one to one", "one to many", or a
 148 mix) and it will influence the overall behavior of the module.
 149 eg. TCP, UDP, MPI
 150
 151 matchEvent() takes an Event pointer. An actual matching module doesn't have to
 152 be able to process every type of event. It will only be passed events of a
 153 type it can process (according to the .canMatch field of its MatchingModule
 154 struct).
 155
 156 ++ Communication between stages 2 and 3: event groups
 157 Communication consists of events grouped in Message, Exchange or Broadcast
 158 structs.
 159
 160 About exchanges:
 161 If one event pair is a packet (more generally, something representable as a
 162 Message), an exchange is composed of at least two packets, one in each
 163 direction. There should be a non-negative minimum "round trip time" (RTT)
 164 between the first and last event of the exchange. This RTT should be as small
 165 as possible so these packets should be closely related in time like a data
 166 packet and an acknowledgement packet. If the events analyzed are such that the
 167 minimum RTT can be zero, there's nothing gained in analyzing exchanges beyond
 168 what can already be figured out by analyzing packets.
 169
 170 An exchange can also consist of more than two packets, in case one packet
 171 single handedly acknowledges many data packets. In this case, it is best to
 172 use the last acknowledged packet. Assuming a linear clock, an acknowledged
 173 packet is as good as any other. However, since the linear clock assumption is
 174 further from reality as the interval grows longer, it is best to keep the
 175 interval between the two packets as short as possible.
 176
 177 ++ Stage 3: Event analysis
 178 This stage and its modules are specific to the algorithm that analyzes events
 179 to deduce synchronization factors.
 180 eg. convex hull, linear regression, broadcast Maximum Likelihood Estimator
 181
 182 Instead of having one analyzeEvents() function that can receive any sort of
 183 grouping of events, there are three prototypes: analyzeMessage(),
 184 analyzeExchange() and analyzeBroadcast(). A module implements only the
 185 relevant one(s) and sets the other function pointers to NULL in its
 186 AnalysisModule struct.
 187
 188 The approach is different from matchEvent() where there is one point of entry
 189 no mather the type of event. The analyze*() approach has the advantage that
 190 there is no casting or type detection to do. It is also possible to deduce
 191 from the functions pointers which groupings of events a module can analyze.
 192 However, it means each analysis module will have to be modified if there is
 193 ever a new type of event grouping.
 194
 195 I chose this approach because:
 196 1) I thought it likely that there will be new types of events but not so
 197    likely that there will be new types of event groups.
 198 2) all events share some members (time, traceNb, ...) but not event groups
 199 3) we'll see which one of the two approaches works best and we can adapt
 200    later.
 201
 202 ++ Data flow
 203 Data from traces flows "down" from processing to matching to analysis. Factors
 204 come back up.
 205
 206 ++ Evolution and adaptation
 207 It is possible to change/add another sync chain and to add other event_*
 208 modules. It has been done. New types of events may need to be added to
 209 data_structures.h. This is only to link between Event-* modules. If the data
 210 does not have to be shared, data_structures.h does not have to be modified.
 211
 212 At the moment there is some code duplication in the last steps of linreg and
 213 chull analysis: the code to propagate the factors when there are more than two
 214 nodes. Maybe there could be a Stage 4 that does that?