[lttv.git] / tags / LinuxTraceToolkitViewer-0.10.0-pre-115102007 / doc / developer / lttng-atomic-up.txt


Atomic UP test results.


using test-time-probe2.ko

Clock speed : cpu MHz         : 3000.077

Tracing inactive

[  125.787229] test init
[  125.787303] test results : time per probe
[  125.787306] number of loops : 20000
[  125.787309] total time : 204413
[  125.787312] test end
[  175.660402] test init
[  175.660475] test results : time per probe
[  175.660479] number of loops : 20000
[  175.660482] total time : 203468
[  175.660484] test end
[  179.337362] test init
[  179.337436] test results : time per probe
[  179.337440] number of loops : 20000
[  179.337443] total time : 204757
[  179.337446] test end

Res : 10.21 cycles per loop

Atomic UP, one trace, flight recorder.

[  357.983971] test init
[  357.988837] test results : time per probe
[  357.988843] number of loops : 20000
[  357.988846] total time : 12349013
[  357.988849] test end
[  358.718896] test init
[  358.723049] test results : time per probe
[  358.723053] number of loops : 20000
[  358.723057] total time : 12332497
[  358.723059] test end
[  359.422038] test init
[  359.426173] test results : time per probe
[  359.426179] number of loops : 20000
[  359.426182] total time : 12332535
[  359.426185] test end

Res : 616.90 cycles per loop.
205.63 ns per loop

Atomic SMP, one trace, flight.


[  111.694180] test init
[  111.700191] test results : time per probe
[  111.700198] number of loops : 20000
[  111.700201] total time : 16925670
[  111.700204] test end
[  112.285716] test init
[  112.291321] test results : time per probe
[  112.291326] number of loops : 20000
[  112.291329] total time : 16766633
[  112.291332] test end
[  112.880602] test init
[  112.884739] test results : time per probe
[  112.884743] number of loops : 20000
[  112.884746] total time : 12358237
[  112.884748] test end

Res : 767.51 cycles per loop
255.83 ns per loop

(205.63-255.83)/255.83 * 100% = 19.62 %


Difference between
cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns
cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns
irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns


* Memory ordering

offset
written by local CPU
read by local CPU and other CPUs (reader)

commit count
written by local CPU
read by local CPU and other CPUs (reader)

consumed
written by any CPU
read by any CPU

data
written by local CPU
read by any CPU


test done in the reader :
if ( consumed < offset )
  if ( subbuf.commit_count == multiple of SUBBUFSIZE)
    read data
    inc consumed


We must guarantee the following ordering :
* offset
Seen from the local CPU :
offset must always be incremented before the data is written (already
consistent)

Seen from other cpus :
offset and data can be written out of order
(because offset is always incremented : in an out of order case, offset is lower
than the actual data ready, but the commit_count _has_ to be incremented to read
the data (and is preceded by a store fence)

* commit_count
commit_count must always be seen by other CPUs after the data has been written.
Therefore, we must put a store fence before the commit_count write. (smp_wmb)

* consumed
Rarely updated, use LOCK prefix. Acts as a full memory barrier.


Mathieu Desnoyers, November 2006
Commit	Line	Data
d7d9a4ce	1
	2	Atomic UP test results.
	3
	4
	5
	6
	7	using test-time-probe2.ko
	8
	9	Clock speed : cpu MHz : 3000.077
	10
	11	Tracing inactive
	12
	13	[ 125.787229] test init
	14	[ 125.787303] test results : time per probe
	15	[ 125.787306] number of loops : 20000
	16	[ 125.787309] total time : 204413
	17	[ 125.787312] test end
	18	[ 175.660402] test init
	19	[ 175.660475] test results : time per probe
	20	[ 175.660479] number of loops : 20000
	21	[ 175.660482] total time : 203468
	22	[ 175.660484] test end
	23	[ 179.337362] test init
	24	[ 179.337436] test results : time per probe
	25	[ 179.337440] number of loops : 20000
	26	[ 179.337443] total time : 204757
	27	[ 179.337446] test end
	28
	29	Res : 10.21 cycles per loop
	30
	31	Atomic UP, one trace, flight recorder.
	32
	33	[ 357.983971] test init
	34	[ 357.988837] test results : time per probe
	35	[ 357.988843] number of loops : 20000
	36	[ 357.988846] total time : 12349013
	37	[ 357.988849] test end
	38	[ 358.718896] test init
	39	[ 358.723049] test results : time per probe
	40	[ 358.723053] number of loops : 20000
	41	[ 358.723057] total time : 12332497
	42	[ 358.723059] test end
	43	[ 359.422038] test init
	44	[ 359.426173] test results : time per probe
	45	[ 359.426179] number of loops : 20000
	46	[ 359.426182] total time : 12332535
	47	[ 359.426185] test end
	48
	49	Res : 616.90 cycles per loop.
	50	205.63 ns per loop
	51
	52	Atomic SMP, one trace, flight.
	53
	54
	55	[ 111.694180] test init
	56	[ 111.700191] test results : time per probe
	57	[ 111.700198] number of loops : 20000
	58	[ 111.700201] total time : 16925670
	59	[ 111.700204] test end
	60	[ 112.285716] test init
	61	[ 112.291321] test results : time per probe
	62	[ 112.291326] number of loops : 20000
	63	[ 112.291329] total time : 16766633
	64	[ 112.291332] test end
65	[ 112.880602] test init
66	[ 112.884739] test results : time per probe
67	[ 112.884743] number of loops : 20000
68	[ 112.884746] total time : 12358237
69	[ 112.884748] test end
70
71	Res : 767.51 cycles per loop
72	255.83 ns per loop
73
74	(205.63-255.83)/255.83 * 100% = 19.62 %
75
3fa56475	76
	77	Difference between
	78	cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns
	79	cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns
	80	irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns
	81
	82
	83
7c5922fc	84	* Memory ordering
	85
	86	offset
	87	written by local CPU
	88	read by local CPU and other CPUs (reader)
	89
	90	commit count
	91	written by local CPU
	92	read by local CPU and other CPUs (reader)
	93
	94	consumed
	95	written by any CPU
	96	read by any CPU
	97
	98	data
	99	written by local CPU
	100	read by any CPU
	101
	102
	103	test done in the reader :
	104	if ( consumed < offset )
	105	if ( subbuf.commit_count == multiple of SUBBUFSIZE)
	106	read data
	107	inc consumed
	108
	109
	110	We must guarantee the following ordering :
	111	* offset
	112	Seen from the local CPU :
	113	offset must always be incremented before the data is written (already
	114	consistent)
	115
	116	Seen from other cpus :
	117	offset and data can be written out of order
	118	(because offset is always incremented : in an out of order case, offset is lower
	119	than the actual data ready, but the commit_count _has_ to be incremented to read
	120	the data (and is preceded by a store fence)
	121
	122	* commit_count
	123	commit_count must always be seen by other CPUs after the data has been written.
	124	Therefore, we must put a store fence before the commit_count write. (smp_wmb)
	125
	126	* consumed
	127	Rarely updated, use LOCK prefix. Acts as a full memory barrier.
3fa56475	128
	129
	130
d7d9a4ce	131	Mathieu Desnoyers, November 2006