move everything out of trunk
[lttv.git] / tests / markers / markers-microbench-0.1.txt
1
2
3 * Microbenchmarks
4
5 Use timestamp counter to calculate the time spent, with interrupts disabled.
6 Machine : Pentium 4 3GHz
7 Fully preemptible kernel
8 marker : MARK(subsys_mark1, "%d %p", 1, NULL);
9 Linux Kernel Markers 0.19
10
11 * Execute an empty loop
12 NR_LOOPS : 10000000
13 time delta (cycles): 15026497
14 cycles per loop : 1.50
15 - i386 "optimized" : immediate value, test and predicted branch
16 (non connected marker)
17 NR_LOOPS : 10000000
18 time delta (cycles): 40031640
19 cycles per loop : 4.00
20 cycles per loop for marker : 2.50
21 - i386 "generic" : load, test and predicted branch
22 (non connected marker)
23 NR_LOOPS : 10000000
24 time delta (cycles): 26697878
25 cycles per loop : 2.67
26 cycles per loop for marker : 1.17
27
28 * Execute a loop of memcpy 4096 bytes
29 - Without marker
30 NR_LOOPS : 10000
31 time delta (cycles): 12981555
32 cycles per loop : 1298.16
33 - i386 "optimized" : immediate value, test and predicted branch
34 (non connected marker)
35 NR_LOOPS : 10000
36 time delta (cycles): 12982290
37 cycles per loop : 1298.23
38 cycles per loop for marker : 0.074
39 - i386 "generic" : load, test and predicted branch
40 (non connected marker)
41 NR_LOOPS : 10000
42 time delta (cycles): 13002788
43 cycles per loop : 1300.28
44 cycles per loop for marker : 2.123
45
46
47 The following tests are done with the "optimized" markers only
48
49 Execute a loop with a marker enabled, with an empty probe.
50 NR_LOOPS : 100000
51 time delta (cycles): 5210587
52 cycles per loop : 52.11
53 cycles per loop for empty probe : 52.11-4.00=48.11
54
55 Execute a loop with marker enabled, with i386 direct argument passing.
56 NR_LOOPS : 100000
57 time delta (cycles): 5299837
58 cycles per loop : 53.00
59 cycles per loop to get arguments in probe (from stack) on x86 : 53.00-52.11=0.89
60
61 Execute a loop with marker enabled, with var args probe.
62 NR_LOOPS : 100000
63 time delta (cycles): 5574300
64 cycles per loop : 55.74
65 cycles per loop to get expected variable arguments on x86 : 55.74-53.00=2.74
66
67 Execute a loop with marker enabled, with var args probe, format string
68 processing.
69 NR_LOOPS : 100000
70 time delta (cycles): 9622117
71 cycles per loop : 96.22
72 cycles per loop to dynamically parse arguments
73 with format string : 96.22-55.74=40.48
74
75
76 * Assembly code
77
78
79 - Optimized
80
81 static int my_open(struct inode *inode, struct file *file)
82 {
83 0: 55 push %ebp
84 1: 89 e5 mov %esp,%ebp
85 3: 83 ec 0c sub $0xc,%esp
86 MARK(subsys_mark1, "%d %p", 1, NULL);
87 6: b0 00 mov $0x0,%al
88 8: 84 c0 test %al,%al
89 a: 75 07 jne 13 <my_open+0x13>
90
91 return -EPERM;
92 }
93 c: b8 ff ff ff ff mov $0xffffffff,%eax
94 11: c9 leave
95 12: c3 ret
96 13: b8 01 00 00 00 mov $0x1,%eax
97 18: e8 fc ff ff ff call 19 <my_open+0x19>
98 1d: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
99 24: 00
100 25: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
101 2c: 00
102 2d: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
103 34: ff 15 74 10 00 00 call *0x1074
104 3a: b8 01 00 00 00 mov $0x1,%eax
105 3f: e8 fc ff ff ff call 40 <my_open+0x40>
106 44: eb c6 jmp c <my_open+0xc>
107
108
109 - Generic
110
111 static int my_open(struct inode *inode, struct file *file)
112 {
113 0: 55 push %ebp
114 1: 89 e5 mov %esp,%ebp
115 3: 83 ec 0c sub $0xc,%esp
116 MARK(subsys_mark1, "%d %p", 1, NULL);
117 6: 0f b6 05 20 10 00 00 movzbl 0x1020,%eax
118 d: 84 c0 test %al,%al
119 f: 75 07 jne 18 <my_open+0x18>
120
121 return -EPERM;
122 }
123 11: b8 ff ff ff ff mov $0xffffffff,%eax
124 16: c9 leave
125 17: c3 ret
126 18: b8 01 00 00 00 mov $0x1,%eax
127 1d: e8 fc ff ff ff call 1e <my_open+0x1e>
128 22: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
129 29: 00
130 2a: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
131 31: 00
132 32: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
133 39: ff 15 74 10 00 00 call *0x1074
134 3f: b8 01 00 00 00 mov $0x1,%eax
135 44: e8 fc ff ff ff call 45 <my_open+0x45>
136 49: eb c6 jmp 11 <my_open+0x11>
137
138 * Size (x86)
139
140 - Optimized
141
142 Adds 6 bytes in the "likely" path.
143 Adds 32 bytes in the "unlikely" path.
144
145 - Generic
146
147 Adds 11 bytes in the "likely" path.
148 Adds 32 bytes in the "unlikely" path.
149
150
151
152 Conclusion
153
154 In an empty loop, the generic marker is faster than the optimized marker. This
155 may be due to better performances of the movzbl instruction over the movb on the
156 Pentium 4 architecture. However, when we execute a loop of 4kB copy, the impact
157 of the movzbl becomes greater because it uses the memory bandwidth.
158
159 The preemption disabling and call to a probe itself costs 48.11 cycles, almost
160 as much as dynamically parsing the format string to get the variable arguments
161 (40.48 cycles).
162
163 There is almost no difference, on x86, between passing the arguments directly on
164 the stack and using a variable argument list when its layout is known
165 statically (0.89 cycles vs 2.74 cycles).
166
167
168
This page took 0.032685 seconds and 4 git commands to generate.