update compat
[lttv.git] / doc / developer / tsc-smallv2.txt
1 Adding support for "compact" 32 bits events.
2
3 Mathieu Desnoyers
4 March 12, 2007
5
6 Use a separate channel for compact events
7
8 Mux those events into this channel and magically they are "compact". Isn't it
9 beautiful.
10
11 event header
12
13 ### COMPACT EVENTS
14
15 32 bits header
16 Aligned on 32 bits
17 5 bits event ID
18 32 events
19 27 bits TSC (cut MSB)
20 wraps 32 times per second at 4GHz
21 each wraps spaced from 0.03125s
22 100HZ clock : tick each 0.01s
23 detect wrap at least each 3 jiffies (dangerous, may miss)
24 granularity : 2^0 = 1 cycle : 0.25ns @4GHz
25 payload size known by facility
26
27 32 bits header
28 Aligned on 32 bits
29 5 bits event ID
30 32 events
31 27 bits TSC (cut LSB)
32 wraps each second at 4GHz
33 100HZ clock : tick each 0.01s
34 granularity : 2^5 = 32 cycles : 8ns @4GHz
35 payload size known by facility
36
37 32 bits header
38 Aligned on 32 bits
39 6 bits event ID
40 64 events
41 26 bits TSC (cut LSB)
42 wraps each 0.5 second at 4GHz
43 100HZ clock : tick each 0.01s
44 granularity : 2^6 = 64 cycles : 16ns @4GHz
45 payload size known by facility
46
47 32 bits header
48 Aligned on 32 bits
49 7 bits event ID
50 128 events
51 25 bits TSC (cut LSB)
52 wraps each 0.5 second at 4GHz
53 100HZ clock : tick each 0.01s
54 granularity : 2^7 = 128 cycles : 32ns @4GHz
55 payload size known by facility
56
57
58
59 ### NORMAL EVENTS
60
61 64 bits header
62 Aligned on 64 bits
63 32 bits TSC
64 wraps each second at 4GHz
65 100HZ clock : tick each 0.01s
66 16 bits event id, (major 8 minor 8)
67 65536 events
68 16 bits event size (extra)
69
70 96 bits header (full 64 bits TSC, useful when no heartbeat available)
71 Aligned on 64 bits
72 64 bits TSC
73 wraps each 146.14 years at 4GHz
74 16 bits event id, (major 8 minor 8)
75 65536 events
76 16 bits event size (extra)
77
78
79 ## Discussion of compact events
80
81 Must put the event ID fields first in the large (64, 96-128 bits) event headers
82 What is the minimum granularity required ? (so we know how much LSB to cut)
83 - How much can synchonized CPU TSCs drift apart one from another ?
84 PLL
85 http://en.wikipedia.org/wiki/Phase-locked_loop
86 static phase offset -> tracking jitter
87 25 MHz oscillator on motherboard for CPU
88 jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
89 http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
90 NEED MORE INFO.
91 - What is the cacheline synchronization latency between the CPUs ?
92 Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
93 Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
94 Intel Core 2, Intel Xeon 5100
95 http://www.intel.com/design/processor/manuals/253665.pdf
96 Up to 10.7 GB/s FSB
97 http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
98 Intel Core Duo Intel Core 2 Duo
99 L2 cache latency 14 cycles 14 cycles
100 (round-trip : 28 cycles) 7ns @4GHz
101 sparc64 : between threads : shares L1 cache.
102 suspected to be ~2 cycles total (1+1) (to check)
103 - How close (cycle-wise) can be two consecutive recorded events in the same
104 buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
105 - Tracing code itself : if it's at a subbuffer boundary, more check to do.
106 Must see the maximum duration of a non interrupted probe.
107 Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
108 TODO : test with NMIs disabled and HT disabled.
109 Ordering can be changed if an interrupt comes between the memory operation
110 and the tracer call. Therefore, we cannot rely on more precision than the
111 expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
112 - If there is a faster interconnect between the CPUs, it can be a problem, but
113 seems to only be proprietary interconnects, not used in general.
114 - IPI are expected to take much more than 28 cycles.
115 What is the minimum wrap-around interval ? (must be safe for timer interrupt
116 miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
117
118 Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
119 Probe never takes 1 cycle.
120 Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1)
121
122 Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
123 (heartbeat each 100HZ, to be safe)
124 Number of MSB to skip :
125 32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] +
126 max_timer_interval[ms]) * cpu_khz[kcycles/s] )) - 1
127 (the last -1 is to make sure we remove less or exact amount of bits, round
128 near to 0, not round up).
129
130 Heartbeat timer :
131 Each timer interrupt
132 Event : 32 bytes in size
133 each timer tick : 100HZ
134 3.2kB/s
135
136 9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events.
137
138
139
140
141
142
143
This page took 0.034434 seconds and 4 git commands to generate.