convert from svn repository: remove tags directory
[lttv.git] / trunk / tests / markers / markers-microbench-0.1.txt
CommitLineData
bc82195a 1
2
3* Microbenchmarks
4
5Use timestamp counter to calculate the time spent, with interrupts disabled.
6Machine : Pentium 4 3GHz
7Fully preemptible kernel
8marker : MARK(subsys_mark1, "%d %p", 1, NULL);
9Linux Kernel Markers 0.19
10
11* Execute an empty loop
12NR_LOOPS : 10000000
13time delta (cycles): 15026497
14cycles per loop : 1.50
15- i386 "optimized" : immediate value, test and predicted branch
16 (non connected marker)
17NR_LOOPS : 10000000
18time delta (cycles): 40031640
19cycles per loop : 4.00
20cycles per loop for marker : 2.50
21- i386 "generic" : load, test and predicted branch
22 (non connected marker)
23NR_LOOPS : 10000000
24time delta (cycles): 26697878
25cycles per loop : 2.67
26cycles per loop for marker : 1.17
27
28* Execute a loop of memcpy 4096 bytes
29- Without marker
30NR_LOOPS : 10000
31time delta (cycles): 12981555
32cycles per loop : 1298.16
33- i386 "optimized" : immediate value, test and predicted branch
34 (non connected marker)
35NR_LOOPS : 10000
36time delta (cycles): 12982290
37cycles per loop : 1298.23
38cycles per loop for marker : 0.074
39- i386 "generic" : load, test and predicted branch
40 (non connected marker)
41NR_LOOPS : 10000
42time delta (cycles): 13002788
43cycles per loop : 1300.28
44cycles per loop for marker : 2.123
45
46
47The following tests are done with the "optimized" markers only
48
49Execute a loop with a marker enabled, with an empty probe.
50NR_LOOPS : 100000
51time delta (cycles): 5210587
52cycles per loop : 52.11
53cycles per loop for empty probe : 52.11-4.00=48.11
54
55Execute a loop with marker enabled, with i386 direct argument passing.
56NR_LOOPS : 100000
57time delta (cycles): 5299837
58cycles per loop : 53.00
59cycles per loop to get arguments in probe (from stack) on x86 : 53.00-52.11=0.89
60
61Execute a loop with marker enabled, with var args probe.
62NR_LOOPS : 100000
63time delta (cycles): 5574300
64cycles per loop : 55.74
65cycles per loop to get expected variable arguments on x86 : 55.74-53.00=2.74
66
67Execute a loop with marker enabled, with var args probe, format string
68processing.
69NR_LOOPS : 100000
70time delta (cycles): 9622117
71cycles per loop : 96.22
72cycles per loop to dynamically parse arguments
73 with format string : 96.22-55.74=40.48
74
75
76* Assembly code
77
78
79- Optimized
80
81static int my_open(struct inode *inode, struct file *file)
82{
83 0: 55 push %ebp
84 1: 89 e5 mov %esp,%ebp
85 3: 83 ec 0c sub $0xc,%esp
86 MARK(subsys_mark1, "%d %p", 1, NULL);
87 6: b0 00 mov $0x0,%al
88 8: 84 c0 test %al,%al
89 a: 75 07 jne 13 <my_open+0x13>
90
91 return -EPERM;
92}
93 c: b8 ff ff ff ff mov $0xffffffff,%eax
94 11: c9 leave
95 12: c3 ret
96 13: b8 01 00 00 00 mov $0x1,%eax
97 18: e8 fc ff ff ff call 19 <my_open+0x19>
98 1d: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
99 24: 00
100 25: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
101 2c: 00
102 2d: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
103 34: ff 15 74 10 00 00 call *0x1074
104 3a: b8 01 00 00 00 mov $0x1,%eax
105 3f: e8 fc ff ff ff call 40 <my_open+0x40>
106 44: eb c6 jmp c <my_open+0xc>
107
108
109- Generic
110
111static int my_open(struct inode *inode, struct file *file)
112{
113 0: 55 push %ebp
114 1: 89 e5 mov %esp,%ebp
115 3: 83 ec 0c sub $0xc,%esp
116 MARK(subsys_mark1, "%d %p", 1, NULL);
117 6: 0f b6 05 20 10 00 00 movzbl 0x1020,%eax
118 d: 84 c0 test %al,%al
119 f: 75 07 jne 18 <my_open+0x18>
120
121 return -EPERM;
122}
123 11: b8 ff ff ff ff mov $0xffffffff,%eax
124 16: c9 leave
125 17: c3 ret
126 18: b8 01 00 00 00 mov $0x1,%eax
127 1d: e8 fc ff ff ff call 1e <my_open+0x1e>
128 22: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
129 29: 00
130 2a: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
131 31: 00
132 32: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
133 39: ff 15 74 10 00 00 call *0x1074
134 3f: b8 01 00 00 00 mov $0x1,%eax
135 44: e8 fc ff ff ff call 45 <my_open+0x45>
136 49: eb c6 jmp 11 <my_open+0x11>
137
138* Size (x86)
139
140- Optimized
141
142Adds 6 bytes in the "likely" path.
143Adds 32 bytes in the "unlikely" path.
144
145- Generic
146
147Adds 11 bytes in the "likely" path.
148Adds 32 bytes in the "unlikely" path.
149
150
151
152Conclusion
153
154In an empty loop, the generic marker is faster than the optimized marker. This
155may be due to better performances of the movzbl instruction over the movb on the
156Pentium 4 architecture. However, when we execute a loop of 4kB copy, the impact
157of the movzbl becomes greater because it uses the memory bandwidth.
158
159The preemption disabling and call to a probe itself costs 48.11 cycles, almost
160as much as dynamically parsing the format string to get the variable arguments
161(40.48 cycles).
162
163There is almost no difference, on x86, between passing the arguments directly on
164the stack and using a variable argument list when its layout is known
165statically (0.89 cycles vs 2.74 cycles).
166
167
168
This page took 0.042553 seconds and 4 git commands to generate.