LTTV 0.8.6 : support for architecture without TSC
[lttv.git] / ltt / branches / poly / doc / developer / lttng-userspace-tracing.txt
1
2 Some thoughts about userspace tracing
3
4 Mathieu Desnoyers January 2006
5
6
7
8 * Goals
9
10 Fast and secure user space tracing.
11
12 Fast :
13
14 - 5000ns for a system call is too long. Writing an event directly to memory
15 takes 220ns.
16 - Still, we can afford a system call for buffer switch, which occurs less often.
17 - No locking, no signal disabling. Disabling signals require 2 system calls.
18 Mutexes are implemented with a short spin lock, followed by a yield. Yet
19 another system call. In addition, we have no way to know on which CPU we are
20 running when in user mode. We can be preempted anywhere.
21 - No contention.
22 - No interrupt disabling : it doesn't exist in user mode.
23
24 Secure :
25
26 - A process shouldn't be able to corrupt the system's trace or another
27 process'trace. It should be limited to its own memory space.
28
29
30
31 * Solution
32
33 - Signal handler concurrency
34
35 Using atomic space reservation in the buffer(s) will remove the requirement for
36 locking. This is the fast and safe way to deal with concurrency coming from
37 signal handlers.
38
39 - Start/stop tracing
40
41 Two possible solutions :
42
43 Either we export a read-only memory page from kernel to user space. That would
44 be somehow seen as a hack, as I have never even seen such interface anywhere
45 else. It may lead to problems related to exported types. The proper, but slow,
46 way to do it would be to have a system call that would return the tracing
47 status.
48
49 My suggestion is to go for a system call, but only call it :
50
51 - when the thread starts
52 - when receiving a SIG_UPDTRACING (multithread ?)
53
54 Note : save the thread ID (process ID) in the logging function and the update
55 handler. Use it as a comparison to check if we are a forked child thread.
56 Start a brand new buffer list in that case.
57
58
59 Two possibilities :
60
61 - one system call per information to get/one system call to get all information.
62 - one signal per information to get/one signal for "update" tracing info.
63
64 I would tend to adopt :
65
66 - One signal for "general tracing update"
67 One signal handler would clearly be enough, more would be unnecessary
68 overhead/pollution.
69 - One system call for all updates.
70 We will need to have multiple parameters though. We have up to 6 parameters.
71
72 syscall get_tracing_info
73
74 parameter 1 : trace buffer map address. (id)
75
76 parameter 2 : active ? (int)
77
78
79 Concurrency
80
81 We must have per thread buffers. Then, no memory can be written by two threads
82 at once. It removes the need for locks (ok, atomic reservation was already doing
83 that) and removes false sharing.
84
85
86 Multiple traces
87
88 By having the number of active traces, we can allocate as much buffers as we
89 need. Allocation is done in the kernel with relay_open. User space mapping is
90 done when receiving the signal/starting the process and getting the number of
91 traces actives.
92
93 It means that we must make sure to only update the data structures used by
94 tracing functions once the buffers are created.
95
96 We could have a syscall "get_next_buffer" that would basically mmap the next
97 unmmapped buffer, or return NULL is all buffers are mapped.
98
99 If we remove a trace, the kernel should stop the tracing, and then get the last
100 buffer for this trace. What is important is to make sure no writers are still
101 trying to write in a memory region that get desallocated.
102
103 For that, we will keep an atomic variable "tracing_level", which tells how many
104 times we are nested in tracing code (program code/signal handlers) for a
105 specific trace.
106
107 We could do that trace removal in two operations :
108
109 - Send an update tracing signal to the process
110 - the sig handler get the new tracing status, which tells that tracing is
111 disabled for the specific trace. It writes this status in the tracing
112 control structure of the process.
113 - If tracing_level is 0, well, it's fine : there are no potential writers in
114 the removed trace. It's up to us to buffer switch the removed trace, and,
115 after the control returns to us, set_tracing_info this page to NULL and
116 delete this memory area.
117 - Else (tracing_level > 0), flag the removed trace for later switch/delete.
118
119 It then returns control to the process.
120
121 - If the tracing_level was > 0, there was one or more writers potentially
122 accessing this memory area. When the control comes back to the writer, at the
123 end of the write in a trace, if the trace is marked for switch/delete and the
124 tracing_level is 0 (after the decrement of the writer itself), then the
125 writer must buffer switch, and then delete the memory area.
126
127
128 Filter
129
130 The update tracing info signal will make the thread get the new filter
131 information. Getting this information will also happen upon process creation.
132
133 parameter 3 for the get tracing info : a integer containing the 32 bits mask.
134
135
136 Buffer switch
137
138 There could be a tracing_buffer_switch system call, that would give the page
139 start address as parameter. The job of the kernel is to steal this page,
140 possibly replacing it with a zeroed page (we don't care about the content of the
141 page after the syscall).
142
143 Process dying
144
145 The kernel should be aware of the current pages used for tracing in each thread.
146 If a thread dies unexpectedly, we want the kernel to get the last bits of
147 information before the thread crashes.
148
149 Memory protection
150
151 If a process corrupt its own mmaped buffers, the rest of the trace will be
152 readable, and each process have its own memory space.
153
154 Two possibilities :
155
156 Either we create one channel per process, or we have per cpu tracefiles for all
157 the processes, with the specification that data is written in a monotically
158 increasing time order and that no process share a 4k page with another process.
159
160 The problem with having only one tracefile per cpu is that we cannot safely
161 steal a process'buffer upon a schedule change because it may be currently
162 writing to it.
163
164 It leaves the one tracefile per thread as the only solution.
165
166 Another argument in favor of this solution is the possibility to have mixed
167 32-64 bits processes on the same machine. Dealing with types will be easier.
168
169
170 Corrupted trace
171
172 A corrupted tracefile will only affect one thread. The rest of the trace will
173 still be readable.
174
175
176 Facilities
177
178 Upon process creation or when receiving the signal of trace info update, when a
179 new trace appears, the thread should write the facility information into it. It
180 must then have a list of registered facilities, all done at the thread level.
181
182 We must decide if we allow a facility channel for each thread. The advantage is
183 that we have a readable channel in flight recorder mode, while the disadvantage
184 is to duplicate the number of channels, which may become quite high. To follow
185 the general design of a high throughput channel and a low throughput channel for
186 vital information, I suggest to have a separate channel for facilities, per
187 trace, per process.
188
189
190
191 API :
192
193 syscall 1 :
194
195 int update_tracing_info(void *buffer, int *active, int *filter);
196
197
198 syscall 2 :
199
200 int tracing_buffer_switch(void *buffer);
201
202
203 Signal :
204
205 UPD_TRACING
206 Default : SIG IGNORE
207 (like hardware fault and expiring timer : to the thread, see p. 413 of Advances
208 prog. in the UNIX env.)
209
210 Will update for itself only : it will remove unnecessary concurrency.
211
212
213
214
215
216
217
218
219
220
221
This page took 0.034163 seconds and 4 git commands to generate.