1 LTTng Relay Daemon Architecture
2 Mathieu Desnoyers, August 2015
4 This document describes the object model and architecture of the relay
5 daemon, after the refactoring done within the commit "Fix: Relay daemon
6 ownership and reference counting".
8 We have the following object composition hierarchy:
10 relay connection (main.c, for sessiond/consumer)
14 \-> 0 or many ctf-trace
20 \-------> 0 or 1 viewer stream
22 live connection (live.c, for client)
26 \-> 0 or many session (actually a reference to session as created
27 | by the relay connection)
29 \-> ..... (ctf-trace, stream, index, viewer stream)
31 There are global tables declared in lttng-relayd.h for sessions
32 (sessions_ht, indexed by session id), streams (relay_streams_ht, indexed
33 by stream handle), and viewer streams (viewer_streams_ht, indexed by
34 stream handle). The purpose of those tables is to allow fast lookup of
35 those objects using the IDs received in the communication protocols.
37 There is also one connection hash table per worker thread. There is one
38 worker thread to receive data (main.c), and one worker thread to
39 interact with viewer clients (live.c). Those tables are indexed by
40 socket file descriptor.
42 A RCU lookup+refcounting scheme has been introduced for all objects
43 (except viewer session which is still an exception at the moment). This
44 scheme allows looking up the objects or doing a traversal on the RCU
45 linked list or hash table in combination with a getter on the object.
46 This getter validates that there is still at least one reference to the
47 object, else the lookup acts just as if the object does not exist.
49 The relay_connection (connection between the sessiond/consumer and the
50 relayd) is the outermost object of its hierarchy.
52 The live connection (connection between a live client and the relayd)
53 is the outermost object of its hierarchy.
55 There is also a "lock" mutex in each object. Those are used to
56 synchronize between threads (currently the main.c relay thread and
57 live.c client thread) when objects are shared. Locks can be nested from
58 the outermost object to the innermost object. IOW, the ctf-trace lock can
59 nest within the session lock.
61 RCU linked lists are used to iterate using RCU, and are protected by
62 their own mutex for modifications. Iterations should be confirmed using
63 the object "getter" to ensure its refcount is not 0 (except in cases
64 where the caller actually owns the objects and therefore can assume its
67 RCU hash tables are used to iterate using RCU. Iteration should be
68 confirmed using the object "getter" to ensure its refcount is not 0
69 (except again if we have ownership and can assume the object refcount is
72 Object creation has a refcount of 1. Each getter increments the
73 refcount, and needs to be paired with a "put" to decrement it. A final
74 put on "self" (ownership) will allow refcount to reach 0, therefore
75 triggering release, and thus free through call_rcu.
77 In the composition scheme, we find back references from each composite
78 to its container. Therefore, each composite holds a reference (refcount)
79 on its container. This allows following pointers from e.g. viewer stream
80 to stream to ctf-trace to session without performing any validation,
81 due to transitive refcounting of those back-references.
83 In addition to those back references, there are a few key ownership
84 references held. The connection in the relay worker thread (main.c)
85 holds ownership on the session, and on each stream it contains. The
86 connection in the live worker thread (live.c) holds ownership on each
87 viewer stream it creates. The rest is ensured by back references from
88 composite to container objects. When a connection is closed, it puts all
89 the ownership references it is holding. This will then eventually
90 trigger destruction of the session, streams, and viewer streams
91 associated with the connection when all the back references reach 0.
93 RCU read-side locks are now only held during iteration on RCU lists and
94 hash tables, and within the internals of the get (lookup) and put
95 functions. Those functions then use refcounting to ensure existence of
96 the object when returned to their caller.