X-Git-Url: https://git.lttng.org/?a=blobdiff_plain;f=ltt%2Fbranches%2Fpoly%2Fdoc%2Fdeveloper%2Flttng-userspace-tracing.txt;h=d61953f58d4d2e0521d593317b2b66e0a499e818;hb=fb3d60478f102a4c2bb0746effa6e2ab049747ba;hp=85a31b3da9da30389be67010c34a7b5eb7026c4d;hpb=7a7472507b5cfdde1e205678c0c66c23eb8bb2ad;p=lttv.git diff --git a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt index 85a31b3d..d61953f5 100644 --- a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt +++ b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt @@ -48,8 +48,13 @@ status. My suggestion is to go for a system call, but only call it : -- when the process starts -- when receiving a SIG_UPDTRACING +- when the thread starts +- when receiving a SIGRTMIN+3 (multithread ?) + +Note : save the thread ID (process ID) in the logging function and the update +handler. Use it as a comparison to check if we are a forked child thread. +Start a brand new buffer list in that case. + Two possibilities : @@ -66,7 +71,9 @@ I would tend to adopt : syscall get_tracing_info -first parameter : active traces mask (32 bits : 32 traces). +parameter 1 : trace buffer map address. (id) + +parameter 2 : active ? (int) Concurrency @@ -79,15 +86,15 @@ that) and removes false sharing. Multiple traces By having the number of active traces, we can allocate as much buffers as we -need. The only thing is that the buffers will only be allocated when receiving -the signal/starting the process and getting the number of traces actives. +need. Allocation is done in the kernel with relay_open. User space mapping is +done when receiving the signal/starting the process and getting the number of +traces actives. It means that we must make sure to only update the data structures used by tracing functions once the buffers are created. -When adding a new buffer, we should call the set_tracing_info syscall and give -the new buffers array to the kernel. It's an array of 32 pointers to user pages. -They will be used by the kernel to get the last pages when the thread dies. +We could have a syscall "get_next_buffer" that would basically mmap the next +unmmapped buffer, or return NULL is all buffers are mapped. If we remove a trace, the kernel should stop the tracing, and then get the last buffer for this trace. What is important is to make sure no writers are still @@ -115,8 +122,7 @@ We could do that trace removal in two operations : accessing this memory area. When the control comes back to the writer, at the end of the write in a trace, if the trace is marked for switch/delete and the tracing_level is 0 (after the decrement of the writer itself), then the - writer must buffer switch, set_tracing_info to NULL and then delete the - memory area. + writer must buffer switch, and then delete the memory area. Filter @@ -124,9 +130,7 @@ Filter The update tracing info signal will make the thread get the new filter information. Getting this information will also happen upon process creation. -parameter 2 for the get tracing info : array of 32 ints (32 bits). -Each integer is the filter mask for a trace. As there are up to 32 active -traces, we have 32 integers for filter. +parameter 3 for the get tracing info : a integer containing the 32 bits mask. Buffer switch @@ -142,15 +146,10 @@ The kernel should be aware of the current pages used for tracing in each thread. If a thread dies unexpectedly, we want the kernel to get the last bits of information before the thread crashes. -syscall set_tracing_info - -parameter 1 : array of 32 user space pointers to current pages or NULL. - - Memory protection -We want each process to be usable to make a trace unreadable, and each process -to have its own memory space. +If a process corrupt its own mmaped buffers, the rest of the trace will be +readable, and each process have its own memory space. Two possibilities : @@ -189,6 +188,127 @@ trace, per process. +API : + +syscall 1 : + +in : +buffer : NULL means get new traces + non NULL means to get the information for the specified buffer +out : +buffer : returns the address of the trace buffer +active : is the trace active ? +filter : 32 bits filter mask + +return : 0 on success, 1 on error. + +int ltt_update(void **buffer, int *active, int *filter); + +syscall 2 : + +in : +buffer : Switch the specified buffer. +return : 0 on success, 1 on error. + +int ltt_switch(void *buffer); + + +Signal : + +SIGRTMIN+3 +(like hardware fault and expiring timer : to the thread, see p. 413 of Advances +prog. in the UNIX env.) + +Signal is sent on tracing create/destroy, start/stop and filter change. + +Will update for itself only : it will remove unnecessary concurrency. + + + +Notes : + +It doesn't matter "when" the process receives the update signal after a trace +start : it will receive it in priority, before executing anything else when it +will be scheduled in. + + + +Major enhancement : + +* Buffer pool * + +The problem with the design, up to now, is if an heavily threaded application +launches many threads that has a short lifetime : it will allocate memory for +each traced thread, consuming time and it will create an incredibly high +number of files in the trace (or per thread). + +(thanks to Matthew Khouzam) +The solution to this sits in the use of a buffer poll : We typically create a +buffer pool of a specified size (say, 10 buffers by default, alterable by the +user), each 8k in size (4k for normal trace, 4k for facility channel), for a +total of 80kB of memory. It has to be tweaked to the maximum number of +expected threads running at once, or it will have to grow dynamically (thus +impacting on the trace). + +A typical approach to dynamic growth is to double the number of allocated +buffers each time a threashold near the limit is reached. + +Each channel would be found as : + +trace_name/user/facilities_0 +trace_name/user/cpu_0 +trace_name/user/facilities_1 +trace_name/user/cpu_1 +... + +When a thread asks for being traced, it gets a buffer from free buffers pool. If +the number of available buffers falls under a threshold, the pool is marked for +expansion and the thread gets its buffer quickly. The expansion will be executed +a little bit later by a worker thread. If however, the number of available +buffer is 0, then an "emergency" reservation will be done, allocating only one +buffer. The goal of this is to modify the thread fork time as less as possible. + +When a thread releases a buffer (the thread terminates), a buffer switch is +performed, so the data can be flushed to disk and no other thread will mess +with it or render the buffer unreadable. + +Upon trace creation, the pre-allocated pool is allocated. Upon trace +destruction, the threads are first informed of the trace destruction, any +pending worker thread (for pool allocation) is cancelled and then the pool is +released. Buffers used by threads at this moment but not mapped for reading +will be simply destroyed (as their refcount will fall to 0). It means that +between the "trace stop" and "trace destroy", there should be enough time to let +the lttd daemon open the newly created channels or they will be lost. + +Upon buffer switch, the reader can read directly from the buffer. Note that when +the reader finish reading a buffer, if the associated thread writer has +exited, it must fill the buffer with zeroes and put it back into the free pool. +In the case where the trace is destroyed, it must just derement its refcount (as +it would do otherwise) and the buffer will be destroyed. + +This pool will reduce the number of trace files created to the order of the +number of threads present in the system at a given time. + +A worse cast scenario is 32768 processes traced at the same time, for a total +amount of 256MB of buffers. If a machine has so many threads, it probably have +enough memory to handle this. + +In flight recorder mode, it would be interesting to use a LRU algorithm to +choose which buffer from the pool we must take for a newly forked thread. A +simple queue would do it. + +SMP : per cpu pools ? -> no, L1 and L2 caches are typically too small to be +impacted by the fact that a reused buffer is on a different or the same CPU. + + + + + + + + + +