X-Git-Url: https://git.lttng.org/?a=blobdiff_plain;f=ltt%2Fbranches%2Fpoly%2Fdoc%2Fdeveloper%2Flttng-userspace-tracing.txt;h=d61953f58d4d2e0521d593317b2b66e0a499e818;hb=fb3d60478f102a4c2bb0746effa6e2ab049747ba;hp=85a31b3da9da30389be67010c34a7b5eb7026c4d;hpb=7a7472507b5cfdde1e205678c0c66c23eb8bb2ad;p=lttv.git

diff --git a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt
index 85a31b3d..d61953f5 100644
--- a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt
+++ b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt
@@ -48,8 +48,13 @@ status.
 
 My suggestion is to go for a system call, but only call it :
 
-- when the process starts
-- when receiving a SIG_UPDTRACING
+- when the thread starts
+- when receiving a SIGRTMIN+3 (multithread ?)
+
+Note : save the thread ID (process ID) in the logging function and the update
+handler. Use it as a comparison to check if we are a forked child thread.
+Start a brand new buffer list in that case.
+
 
 Two possibilities :
 
@@ -66,7 +71,9 @@ I would tend to adopt :
 
 syscall get_tracing_info
 
-first parameter : active traces mask (32 bits : 32 traces).
+parameter 1 : trace buffer map address. (id)
+
+parameter 2 : active ? (int)
 
 
 Concurrency
@@ -79,15 +86,15 @@ that) and removes false sharing.
 Multiple traces
 
 By having the number of active traces, we can allocate as much buffers as we
-need. The only thing is that the buffers will only be allocated when receiving
-the signal/starting the process and getting the number of traces actives.
+need. Allocation is done in the kernel with relay_open. User space mapping is
+done when receiving the signal/starting the process and getting the number of
+traces actives.
 
 It means that we must make sure to only update the data structures used by
 tracing functions once the buffers are created.
 
-When adding a new buffer, we should call the set_tracing_info syscall and give
-the new buffers array to the kernel. It's an array of 32 pointers to user pages.
-They will be used by the kernel to get the last pages when the thread dies.
+We could have a syscall "get_next_buffer" that would basically mmap the next
+unmmapped buffer, or return NULL is all buffers are mapped.
 
 If we remove a trace, the kernel should stop the tracing, and then get the last
 buffer for this trace. What is important is to make sure no writers are still
@@ -115,8 +122,7 @@ We could do that trace removal in two operations :
 	accessing this memory area. When the control comes back to the writer, at the
 	end of the write in a trace, if the trace is marked for switch/delete and the
 	tracing_level is 0 (after the decrement of the writer itself), then the
-	writer must buffer switch, set_tracing_info to NULL and then delete the
-	memory area.
+	writer must buffer switch, and then delete the memory area.
 
 
 Filter
@@ -124,9 +130,7 @@ Filter
 The update tracing info signal will make the thread get the new filter
 information. Getting this information will also happen upon process creation.
 
-parameter 2 for the get tracing info : array of 32 ints (32 bits).
-Each integer is the filter mask for a trace. As there are up to 32 active
-traces, we have 32 integers for filter.
+parameter 3 for the get tracing info : a integer containing the 32 bits mask.
 
 
 Buffer switch
@@ -142,15 +146,10 @@ The kernel should be aware of the current pages used for tracing in each thread.
 If a thread dies unexpectedly, we want the kernel to get the last bits of
 information before the thread crashes.
 
-syscall set_tracing_info
-
-parameter 1 : array of 32 user space pointers to current pages or NULL.
-
-
 Memory protection
 
-We want each process to be usable to make a trace unreadable, and each process
-to have its own memory space.
+If a process corrupt its own mmaped buffers, the rest of the trace will be
+readable, and each process have its own memory space.
 
 Two possibilities :
 
@@ -189,6 +188,127 @@ trace, per process.
 
 
 
+API :
+
+syscall 1 :
+
+in :
+buffer : NULL means get new traces
+				 non NULL means to get the information for the specified buffer
+out :
+buffer : returns the address of the trace buffer
+active : is the trace active ?
+filter : 32 bits filter mask
+
+return : 0 on success, 1 on error.
+
+int ltt_update(void **buffer, int *active, int *filter);
+
+syscall 2 :
+
+in :
+buffer : Switch the specified buffer.
+return : 0 on success, 1 on error.
+
+int ltt_switch(void *buffer);
+
+
+Signal :
+
+SIGRTMIN+3
+(like hardware fault and expiring timer : to the thread, see p. 413 of Advances
+prog. in the UNIX env.)
+
+Signal is sent on tracing create/destroy, start/stop and filter change.
+
+Will update for itself only : it will remove unnecessary concurrency.
+
+
+
+Notes :
+
+It doesn't matter "when" the process receives the update signal after a trace
+start : it will receive it in priority, before executing anything else when it
+will be scheduled in.
+
+
+
+Major enhancement :
+
+* Buffer pool *
+
+The problem with the design, up to now, is if an heavily threaded application
+launches many threads that has a short lifetime : it will allocate memory for
+each traced thread, consuming time and it will create an incredibly high
+number of files in the trace (or per thread).
+
+(thanks to Matthew Khouzam)
+The solution to this sits in the use of a buffer poll : We typically create a
+buffer pool of a specified size (say, 10 buffers by default, alterable by the
+user), each 8k in size (4k for normal trace, 4k for facility channel), for a
+total of 80kB of memory. It has to be tweaked to the maximum number of
+expected threads running at once, or it will have to grow dynamically (thus
+impacting on the trace).
+
+A typical approach to dynamic growth is to double the number of allocated
+buffers each time a threashold near the limit is reached.
+
+Each channel would be found as :
+
+trace_name/user/facilities_0
+trace_name/user/cpu_0
+trace_name/user/facilities_1
+trace_name/user/cpu_1
+...
+
+When a thread asks for being traced, it gets a buffer from free buffers pool. If
+the number of available buffers falls under a threshold, the pool is marked for
+expansion and the thread gets its buffer quickly. The expansion will be executed
+a little bit later by a worker thread. If however, the number of available
+buffer is 0, then an "emergency" reservation will be done, allocating only one
+buffer. The goal of this is to modify the thread fork time as less as possible.
+
+When a thread releases a buffer (the thread terminates), a buffer switch is
+performed, so the data can be flushed to disk and no other thread will mess
+with it or render the buffer unreadable.
+
+Upon trace creation, the pre-allocated pool is allocated. Upon trace
+destruction, the threads are first informed of the trace destruction, any
+pending worker thread (for pool allocation) is cancelled and then the pool is
+released. Buffers used by threads at this moment but not mapped for reading
+will be simply destroyed (as their refcount will fall to 0). It means that
+between the "trace stop" and "trace destroy", there should be enough time to let
+the lttd daemon open the newly created channels or they will be lost.
+
+Upon buffer switch, the reader can read directly from the buffer. Note that when
+the reader finish reading a buffer, if the associated thread writer has
+exited, it must fill the buffer with zeroes and put it back into the free pool.
+In the case where the trace is destroyed, it must just derement its refcount (as
+it would do otherwise) and the buffer will be destroyed.
+
+This pool will reduce the number of trace files created to the order of the
+number of threads present in the system at a given time.
+
+A worse cast scenario is 32768 processes traced at the same time, for a total
+amount of 256MB of buffers. If a machine has so many threads, it probably have
+enough memory to handle this.
+
+In flight recorder mode, it would be interesting to use a LRU algorithm to
+choose which buffer from the pool we must take for a newly forked thread. A
+simple queue would do it.
+
+SMP : per cpu pools ? -> no, L1 and L2 caches are typically too small to be
+impacted by the fact that a reused buffer is on a different or the same CPU.
+
+
+
+
+
+
+
+
+
+