Update the README file to mark the project as unmaintained
[lttv.git] / lttv / lttv / sync / README
CommitLineData
add19043
BP
1Benjamin Poirier
2benjamin.poirier@polymtl.ca
277e5b53 32009, 2010
add19043
BP
4
5+ About time synchronization
6This framework performs offline time synchronization. This means that the
7synchronization is done after tracing is over. It is not the same as online
8synchronization like what is done by NTP. Nor is it directly influenced by it.
9
10Event timestamps are adjusted according to a clock correction function that
11palliates for initial offset and rate offset (ie. clocks that don't start out
12at the same value and clocks that don't run at the same speed). It can work on
13two or more traces.
14
15The synchronization is based on relations identified in network traffic
16between nodes. So, for it to work, there must be traffic exchanged between the
17nodes. At the moment, this must be TCP traffic. Any kind will do (ssh, http,
18...)
19
20For scientific information about the algorithms used, see:
21* Duda, A., Harrus, G., Haddad, Y., and Bernard, G.: Estimating global time in
22distributed systems, Proc. 7th Int. Conf. on Distributed Computing Systems,
23Berlin, volume 18, 1987
24* Ashton, P.: Algorithms for Off-line Clock Synchronisation, University of
25Canterbury, December 1995
26http://www.cosc.canterbury.ac.nz/research/reports/TechReps/1995/tr_9512.pdf
27
28+ Using time synchronization
29++ Recording traces
30To use time synchronization you have to record traces on multiple nodes
31simultaneously with lttng (the tracer). While recording the traces, you have
32to make sure the following markers are enabled:
33* dev_receive
34* dev_xmit_extended
35* tcpv4_rcv_extended
36* udpv4_rcv_extended
bc7c054d 37You can use 'ltt-armall -n' for this.
9a9ca632 38
add19043
BP
39You also have to make sure there is some TCP traffic between the traced nodes.
40
41++ Viewing traces
42Afterwards, you have to make sure all the traces are accessible from a single
43machine, where lttv (the viewer) is run.
44
45Time synchronization is enabled and controlled via the following lttv options,
46as seen with "-h":
47--sync
48 synchronize the time between the traces
49--sync-stats
50 print statistics about the time synchronization
bc7c054d 51 See the section "Statistics" for more information.
add19043
BP
52--sync-null
53 read the events but do not perform any processing, this
54 is mostly for performance evaluation
bc7c054d
BP
55--sync-analysis - argument: chull, linreg, eval
56 specify the algorithm to use for event analysis. See the
b2da0724
BP
57 section "Synchronization Alogrithms".
58--sync-reduction - argument: accuracy
59 specify the algorithm to use for factor reduction. See
60 the section "Reduction Algorithms".
add19043
BP
61--sync-graphs
62 output gnuplot graph showing synchronization points
63--sync-graphs-dir - argument: DIRECTORY
64 specify the directory where to store the graphs, by
65 default in "graphs-<lttv-pid>"
66
67To enable synchronization, start lttv with the "--sync" option. It can be
68used in text mode or in GUI mode. You can add the traces one by one in the GUI
69but this will recompute the synchronization after every trace that is added.
70Instead, you can save some time by specifying all your traces on the command
71line (using -t).
72
73Example:
74lttv-gui -t traces/node1 -t traces/node2 --sync
75
76++ Statistics
bc7c054d
BP
77The --sync-stats option is useful to know how well the synchronization
78algorithms worked. Here is an example output (with added comments) from a
79successful chull (one of the synchronization algorithms) run of two traces:
add19043
BP
80 LTTV processing stats:
81 received frames: 452
82 received frames that are IP: 452
83 received and processed packets that are TCP: 268
84 sent packets that are TCP: 275
85 TCP matching stats:
86 total input and output events matched together to form a packet: 240
87 Message traffic:
88 0 - 1 : sent 60 received 60
89# Note that 60 + 60 < 240, this is because there was loopback traffic, which is
90# discarded.
91 Convex hull analysis stats:
92 out of order packets dropped from analysis: 0
93 Number of points in convex hulls:
94 0 - 1 : lower half-hull 7 upper half-hull 9
95 Individual synchronization factors:
96 0 - 1 : Middle a0= -1.33641e+08 a1= 1 - 4.5276e-08 accuracy 1.35355e-05
97 a0: -1.34095e+08 to -1.33187e+08 (delta= 907388)
98 a1: 1 -6.81298e-06 to +6.72248e-06 (delta= 1.35355e-05)
bc7c054d
BP
99# "Middle" is the best type of synchronization for chull. See the section
100# "Convex Hull" below.
add19043
BP
101 Resulting synchronization factors:
102 trace 0 drift= 1 offset= 0 (0.000000) start time= 18.799023588
103 trace 1 drift= 1 offset= 1.33641e+08 (0.066818) start time= 19.090688494
104 Synchronization time:
105 real time: 0.113308
106 user time: 0.112007
107 system time: 0.000000
108
b2da0724 109++ Synchronization Algorithms
add19043 110The synchronization framework is extensible and already includes two
b2da0724
BP
111algorithms: chull and linreg. (There is also a special "eval" module
112available.) You can choose which analysis algorithm to use with the
113--sync-analysis option.
add19043 114
bc7c054d
BP
115+++ Convex Hull
116chull, the default analysis module, can provide a garantee that there are no
117message inversions after synchronization. When printing the statistics, it
118will print, for each trace, the type of factors found:
119* "Middle", all went according to assumptions and there will be no message
120 inversions
121* "Fallback", it was not possible to garantee no message inversion so
122 approximate factors were given instead. This may happen during long running
123 traces where the non-linearity of the clocks was notable. If you can, try to
124 reduce the duration of the trace. (Sometimes this may happen during a trace
125 as short as 120s. but sometimes traces 30 mins. or longer are ok, your
126 milleage may vary). It would also be to improve the algorithms to avoid
127 this, see the "Todo" section. In any case, you may get better results (but
128 still no garantee) by choosing the linreg algorithm instead.
129* "Absent", the trace pair does not contain common communication events. Are
130 you sure the nodes exchanged TCP traffic during the trace?
131
132There are also other, less common, types. See the enum ApproxType in
133event_analysis_chull.h.
134
135+++ Linear Regression
29dd991d 136linreg sometimes gives more precise results than chull but it provides no
bc7c054d
BP
137garantee
138
139+++ Synchronization evaluation
140eval is a special module, it doesn't really perform synchronization, instead
141it calculates and prints different metrics about how well traces are
142synchronized. Although it can be run like other analysis modules, it is most
143useful when run in a postprocessing step, after another synchronization module
29dd991d 144has been run. Eval is most commonly run in text mode. To do this, run:
4378bba0
BP
145lttv -m sync_chain_batch [usual options, ex: -t traces/node1 -t traces/node2
146--sync ...]
147It can also be run from the lttv source tree via runlttv:
148./runlttv -m eval [usual runlttv options, ex: traces/node1 traces/node2]
bc7c054d
BP
149
150eval provides a few more options:
151--eval-rtt-file - argument: FILE
152 specify the file containing RTT information
153--eval-graphs - argument: none
154 output gnuplot graph showing synchronization points
155--eval-graphs-dir - argument: eval-graphs-<lttv pid>
156 specify the directory where to store the graphs
157
158The RTT file should contain information on the minimum round-trip time between
159nodes involved in the trace. This information is used (optionally) in the
160evaluation displayed and in the histogram graphs produced. The file should
161contain a series of lines of the form:
162192.168.112.56 192.168.112.57 0.100
163The first two fields are the IP addresses of the source and destination hosts.
164(hostnames are not supported). The last field is the minimum rtt in ms. The
165fields are separated by whitespace. '#' comments a line.
166
167Many commands can be used to measure the RTT, for example:
168ping -s 8 -A -c 8000 -w 10 192.168.112.57
169
29dd991d
BP
170Note that this must be repeated in both directions in the file, that is:
171192.168.112.49 192.168.112.50 0.057
172192.168.112.50 192.168.112.49 0.050
bc7c054d
BP
173
174++++ Linear Programming and GLPK
175The synchronization evaluation can optionally perform an analysis similar to
176chull but by using a linear program in one of the steps. This can be used to
177validate a part of the chull algorithm but it can also be used to provide a
178measure of the accuracy of the synchronization in any point (this is seen in
179the graph output).
180
181This is enabled by default at configure time (--with-glpk) if the GNU Linear
29dd991d
BP
182Programming Kit is available (libglpk). On Debian-like systems (ex. Ubuntu),
183install the package "libglpk-dev".
184
185To see the output of this mode, run:
186lttv -m sync_chain_batch --eval-graphs [usual options, ex: -t traces/node1 -t
187traces/node2 --sync ...]
bc7c054d 188
b2da0724
BP
189+ Reduction Algorithms
190Event analysis yields time correction factors between trace pairs. For groups
191of more than two traces, an extra step is necessary to identify a reference
192trace and calculate correction factors for each trace relative to this
193reference. There are usually many possibilities and so this step is called
194"factor reduction".
195
196++ Accuracy
197At the moment, only one algorithm is available to do this, the "accuracy"
198algorithm. This algorithm tries to choose the reference and the factors that
199yield the best accuracy. See the function header comments in
200factor_reduction_accuracy.c for more details.
201
add19043
BP
202+ Design
203This part describes the design of the synchronization framework. This is to
204help programmers interested in:
205* adding new synchronization algorithms (analysis part)
206 There are already two analysis algorithms available: chull and linreg
207* using new types of events (processing and matching parts)
bc7c054d
BP
208 There are already two types of events supported: tcp messages and udp
209 broadcasts
add19043
BP
210* using time synchronization with another data source/tracer (processing part)
211 There are already two data sources available: lttng and unittest
212
213++ Sync chain
214This part is specific to the framework in use: the program doing
215synchronization, the executable linking to the event_*.o
216eg. LTTV, unittest
217
b2da0724
BP
218This reads parameters, creates SyncState and calls the init functions of the
219modules to be used. The "sync chain" is this set of modules. At the moment
220there is only one module at each stage. However, as more modules are added, it
221will become relevant to have many modules at the same stage simultaneously.
222This will require some modifications. It is already partly supported at the
b670bb7c
BP
223matching stage through encapsulation of other matching modules.
224
48b641c1 225sync_chain_unitest:main() provides a fairly simple example of sync chain
b670bb7c 226implementation.
add19043
BP
227
228++ Stage 1: Event processing
229Specific to the tracing data source.
230eg. LTTng, LTT userspace, libpcap
231
232Read the events from the trace and stuff them in an appropriate Event object.
233
234++ Communication between stages 1 and 2: events
235Communication is done via objects specialized from Event. At the moment, all
236*Event are in data_structures.h. Specific event structures and functions could
237be in separate files. This way, adding a new set of modules would require
238shipping extra data_structures* files instead of modifying the existing one.
239For this to work, Event.type couldn't be an enum, it could be an int and use
f6691532 240#defines or constants defined in the specialized data_structures* files.
add19043
BP
241Event.event could be a void*.
242
243++ Stage 2: Event matching
244This stage and its modules are specific to the type of event. Event processing
245feeds the events one at a time but event analysis works on groups of events.
246Event matching is responsible for forming these groups. Generally speaking,
247these can have different types of relation ("one to one", "one to many", or a
248mix) and it will influence the overall behavior of the module.
249eg. TCP, UDP, MPI
250
f6691532
BP
251matchEvent() takes an Event pointer. An actual matching module doesn't have to
252be able to process every type of event. It will only be passed events of a
253type it can process (according to the .canMatch field of its MatchingModule
254struct).
add19043
BP
255
256++ Communication between stages 2 and 3: event groups
257Communication consists of events grouped in Message, Exchange or Broadcast
258structs.
259
260About exchanges:
261If one event pair is a packet (more generally, something representable as a
262Message), an exchange is composed of at least two packets, one in each
263direction. There should be a non-negative minimum "round trip time" (RTT)
264between the first and last event of the exchange. This RTT should be as small
265as possible so these packets should be closely related in time like a data
266packet and an acknowledgement packet. If the events analyzed are such that the
267minimum RTT can be zero, there's nothing gained in analyzing exchanges beyond
268what can already be figured out by analyzing packets.
269
270An exchange can also consist of more than two packets, in case one packet
271single handedly acknowledges many data packets. In this case, it is best to
bc7c054d 272use the last data packet. Assuming a linear clock, an acknowledged
add19043
BP
273packet is as good as any other. However, since the linear clock assumption is
274further from reality as the interval grows longer, it is best to keep the
275interval between the two packets as short as possible.
276
277++ Stage 3: Event analysis
278This stage and its modules are specific to the algorithm that analyzes events
279to deduce synchronization factors.
280eg. convex hull, linear regression, broadcast Maximum Likelihood Estimator
281
eb8e0e6f
BP
282This module should return a set of synchronization factors for each trace
283pair. Some trace pairs may have no factors, their approxType should be set to
284ABSENT.
285
add19043
BP
286Instead of having one analyzeEvents() function that can receive any sort of
287grouping of events, there are three prototypes: analyzeMessage(),
288analyzeExchange() and analyzeBroadcast(). A module implements only the
bc7c054d 289relevant one(s) and the other function pointers are NULL.
add19043
BP
290
291The approach is different from matchEvent() where there is one point of entry
292no mather the type of event. The analyze*() approach has the advantage that
293there is no casting or type detection to do. It is also possible to deduce
294from the functions pointers which groupings of events a module can analyze.
295However, it means each analysis module will have to be modified if there is
296ever a new type of event grouping.
297
298I chose this approach because:
2991) I thought it likely that there will be new types of events but not so
300 likely that there will be new types of event groups.
3012) all events share some members (time, traceNb, ...) but not event groups
3023) we'll see which one of the two approaches works best and we can adapt
303 later.
304
b2da0724 305++ Stage 4: Factor reduction
eb8e0e6f
BP
306This stage reduces the pair-wise synchronization factors obtained in step 3 to
307time correction factors for each trace. It is most useful when synchronizing
308more than two traces.
b2da0724 309
add19043 310++ Evolution and adaptation
b2da0724
BP
311It is possible to change/add another sync chain and to add other modules. It
312has been done. New types of events may need to be added to data_structures.h.
313This is only to link between Event-* modules. If the data does not have to be
314shared, data_structures.h does not have to be modified.
This page took 0.043136 seconds and 4 git commands to generate.