Rename C++ header files to .hpp Rename all C++ header files (include/**/*-internal.h, src/**/*.h except argpar and msgpack, some headers in tests) to have the .hpp extension. Doing so highlights that we include some C++ header files in some test files still compiled as C. This is ok for now, as the files they include don't actually contain C++ code incompatible with C yet, but they could eventually. This is something we can fix later. Change-Id: I8bf326b6b2946a3e26704f3ef3ac5831bbe9bc26 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Move to kernel style SPDX license identifiers The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. See https://spdx.org/ids-how for details. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Change-Id: I62e7038e191a061286abcef5550b58f5ee67149d Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: relayd streams can be leaked on connection error There are cases where a connection error can cause streams to be leaked. For instance, the control connection could receive an index and close. Since a packet is in-flight, the stream corresponding to that index will not close. However, nothing guarantees that the data connection will be able to receive the packet's data. If the protocol is respected, this is not a problem. However, a buggy consumerd or network errors can cause the streams to remain in the "data in-flight" state and never close. To mitigate a case observed in the field where a consumerd would be forcibly closed (network interface brought down) and cause leaks on the relay daemon, the session is aborted whenever the control or data connection encounters an error. Aborting a session causes the streams to be closed regardless of the fact that data is in-flight. Currently, only the control connection holds an ownership of the session object. This can cause the following scenario to leak streams: 1) Control connection receives an index - Stream is put in "in-flight data" mode 2) Control connection is closed/shutdown cleanly - try_stream_close refuses to close the stream as data is in-flight, but it puts the stream in "closed" mode. When the data is received, the stream will be closed as soon as possible. 3) Data connection closes cleanly or due to an error - The stream "closing" condition will never be re-evaluated. Since the data connection has no ownership of the session, it can never clean-up the streams that are waiting for "in-flight" data to arrive before closing. This patch lazily associates the data connection to its session so that the session can be aborted whenever an error happens on either the data or control connection. Note that this leaves the relayd vulnerable to a case which will still leak. If the control connection receives an index and closes cleanly, the data connection could have never been established with the consumer daemon and result in a leak. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Use non-blocking recvmsg() for data/ctrl connections of lttng-relayd The relay daemon's use of blocking network I/O can cause severe performance degradation when interacting with unresponsive peers. This patch changes the recvmsg() calls to use the MSG_DONTWAIT flag which makes the call non-blocking. The connection classes are modified to handle the partial reception of buffers. The sendmsg() calls are still blocking, but this is assumed to represent a fairly minimal risk of actually blocking given that the control protocol's replies consist of 4-byte status codes. A similar approach could be used to make the live connections non-blocking as that side may also suffer from the same resiliancy problems. So far, no users have reported this problem so it is not prioritised. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
relay: use urcu_ref_get_unless_zero This allows removing the reflock be performing this check and increment atomically. The minimum version of userspace-rcu is bumped to 0.9.0 as urcu_ref_get_unless_zero() was introduced as part of that release. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: Relay daemon ownership and reference counting The ownership and reference counting of the relay daemon is unclear and buggy in many ways. It is the cause of memory corruptions, double-free, leaks, segmentation faults, observed in various conditions. Fix this situation by introducing a clear ownership and reference counting scheme for this daemon. See doc/relayd-architecture.txt for details. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change wfq usages for wfcq This removes the deprecated warnings when building lttng-tools. We can now build with -Werror, woohoo! This makes lttng-tools depends on userspace-rcu version 0.8.0 and above. The configure.ac and README files have been updated for this. Verified by running make check. Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: use after free of a relayd stream A race could occur with a stream destruction and a control connection being destroyed emptying its recv_list. A freed stream could still be in the list thus having a use after free during the connection destroy. That was triggering undefined behavior from infinite looping to segmentation faults. We've observed this issue on high load stress test. A relayd received all the stream but NOT the streams sent command which empty the list. This can happen if a start tracing never occured or failed on the application side thus the close stream command is sent to the relayd freeing the stream before it is removed from that list. Signed-off-by: David Goulet <dgoulet@efficios.com>