From e560c525031d7aba481e0b8655c48a14bade24c2 Mon Sep 17 00:00:00 2001
From: =?utf8?q?J=C3=A9r=C3=A9mie=20Galarneau?=
 <jeremie.galarneau@efficios.com>
Date: Tue, 12 Dec 2023 16:54:41 -0500
Subject: [PATCH] Fix: sessiond: freeze on channel creation on restart
MIME-Version: 1.0
Content-Type: text/plain; charset=utf8
Content-Transfer-Encoding: 8bit

Issue observed
--------------

When using lttng via a script, the session and consumer daemons appear
to completely lock up when we request that a channel be created. The
conditions for this lockup seem to be created by destroying a sessiond
and then creating a sessiond in quick sequence.

This can be reproduced, on some systems, by launching a session daemon
and running the following commands:
  $ sudo killall lttng-sessiond
  $ sudo lttng-sessiond --daemonize
  $ lttng create my_session --snapshot --output /tmp/demo-output
  $ lttng enable-channel --kernel my_channel

Note that 'killall' above is racy as it does not wait for the session
daemon to be killed. Hence, it is not unexpected for the incoming
session daemon to see the smoldering ashes of the "outgoing" session
daemon. However, it would be helpful if the second session daemon
instance warned the user of the existing session daemon instance.

From the logs captured from both instances of the lttng-sessiond (the
outgoing and incoming instances), there appears to be a time period
during which both session daemons are active at once.

This behaviour is unexpected as the session daemon guards itself (in
theory) from running multiple conflicting instances.

The guarding mechanism works in two steps (see the implementation of
`check_existing_daemon` @ src/bin/lttng-sessiond/main.cpp:926)

When a session daemon is launched, it attempts to connect to any active
session daemon's 'client' endpoint (a UNIX socket, the same used by
liblttng-ctl to communicate with the session daemon).

If the daemon receives a reply, it can assume that another session
daemon instance is already active and abort its launch. Conversely, when
no reply is received, it uses a "lock file" mechanism to check for other
running instances.

The lock file-based check creates a file (typically
/var/run/lttng/lttng-sessiond.lck in the case of a root session daemon)
and acquires an exclusive (write) POSIX lock on it [1]. The assumption
is that any other instance would own the lock and cause the operation to
fail.

On a reproducer system, we could notice that the client thread of the
outgoing sessiond daemon was torn down before the launch of the
initialization of the incoming session daemon. This caused the incoming
session daemon to not receive a reply to its connection attempt and
fall-back to the lock file-based mechanism.

Surprinsingly, it appears that the lock file checks succeeds even though
the outgoing session daemon still holds the lock file according to its
log.

See the original bug report for more information about the investigation
and how to reproduce the problem.

Cause
-----

The POSIX file locking API has a number of surprising behaviours[2] that
have seen it being superseded by platform-specific APIs. In our case,
the one that bit us is that any file lock held by a process is
automatically released when any of the file descriptors that reference
the file's description is released.

In practical terms, if a process forks and its child dies, it loses its
file lock since the child's file descriptors are closed on exit.

The LWN article linked below describes a similar scenario:

  It's common to have a library routine that opens a file, reads or
  writes to it, and then closes it again, without the calling
  application ever being aware that has occurred. If the application
  happens to be holding a lock on the file when that occurs, it can lose
  that lock without ever being aware of it.

The problem affects any use of the --background/--daemonize options
since, as part of the daemonization process (which occurs after the lock
file acquisition), the session daemon forks and its parent process
exits. This causes one of the descriptors pointing to the lock file to
be closed and the lock to be released.

After that point, any other instance of the session daemon process would
succeed in acquiring the lock file and assume it is the sole instance on
the system.

Solution
--------

The lock file code is modified to use the non-POSIX `flock`[3]
interface which is available on Linux and some BSDs[4]. `flock` provides
us with the guarantee we thought we had: that the file lock is only
released when _all_ file descriptors pointing to a given file
description are closed.

Drawbacks
---------

As a fallback, platforms that don't support `flock` will use the original
locking mechanism. Since this is a "hint" to warn users when erroneously
launch a second privileged session daemon, it seems acceptable for it
to not be completely reliable on secondary platforms.

References
----------

[1] https://man7.org/linux/man-pages/man2/fcntl.2.html (see F_SETLK)
[2] https://lwn.net/Articles/586904/
[3] https://linux.die.net/man/2/flock
[4] https://man.freebsd.org/cgi/man.cgi?query=flock&sektion=2

Fixes #1405

Reported-by: Erica Bugden <ebugden@efficios.com>
Signed-off-by: JÃ©rÃ©mie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic505ff0671c321f808050831ef2b7152cdbf4b8a
---
 configure.ac            |  2 +-
 src/common/lockfile.cpp | 82 +++++++++++++++++++++++++++++++----------
 2 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/configure.ac b/configure.ac
index b1a06ab19..3c3b2f39b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -316,7 +316,7 @@ AC_CHECK_FUNCS([ \
 	strncasecmp strndup strnlen strpbrk strrchr strstr strtol strtoul \
 	strtoull dirfd gethostbyname2 getipnodebyname epoll_create1 \
 	sched_getcpu sysconf sync_file_range getrandom posix_fadvise \
-	arc4random
+	arc4random flock
 ])
 
 # Check for pthread_setname_np and pthread_getname_np
diff --git a/src/common/lockfile.cpp b/src/common/lockfile.cpp
index 5dce37d6f..3e972fbac 100644
--- a/src/common/lockfile.cpp
+++ b/src/common/lockfile.cpp
@@ -10,49 +10,93 @@
 #include <common/lockfile.hpp>
 #include <common/macros.hpp>
 
+#include <fcntl.h>
+
 #ifdef HAVE_FLOCK
 
+#include <sys/file.h>
+
+static int lock_file(const char *filepath, int fd)
+{
+	int ret;
+
+	/*
+	 * Attempt to lock the file. If this fails, there is
+	 * already a process using the same lock file running
+	 * and we should exit.
+	 */
+	ret = flock(fd, LOCK_EX | LOCK_NB);
+	if (ret == -1) {
+		/* EWOULDBLOCK are expected if the file is locked: don't spam the logs. */
+		if (errno != EWOULDBLOCK) {
+			PERROR("Failed to apply lock on lock file: file_path=`%s`", filepath);
+		}
+	}
+
+	return ret;
+}
+
 #else /* HAVE_FLOCK */
 
-#include <fcntl.h>
+static int lock_file(const char *filepath, int fd)
+{
+	int ret;
+	struct flock lock = {};
+
+	lock.l_whence = SEEK_SET;
+	lock.l_type = F_WRLCK;
+
+	/*
+	 * Attempt to lock the file. If this fails, there is
+	 * already a process using the same lock file running
+	 * and we should exit.
+	 */
+	ret = fcntl(fd, F_SETLK, &lock);
+	if (ret == -1) {
+		/* EAGAIN and EACCESS are expected if the file is locked: don't spam the logs. */
+		if (errno != EAGAIN && errno != EACCES) {
+			PERROR("Failed to set lock on lock file: file_path=`%s`", filepath);
+		}
+	}
+
+	return ret;
+}
+
+#endif /* HAVE_FLOCK */
 
 int utils_create_lock_file(const char *filepath)
 {
-	int ret;
-	int fd;
-	struct flock lock;
+	int ret, fd;
 
 	LTTNG_ASSERT(filepath);
 
-	memset(&lock, 0, sizeof(lock));
 	fd = open(filepath, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP);
 	if (fd < 0) {
-		PERROR("open lock file %s", filepath);
+		PERROR("Failed to open lock file `%s`", filepath);
 		fd = -1;
 		goto error;
 	}
 
 	/*
-	 * Attempt to lock the file. If this fails, there is
-	 * already a process using the same lock file running
-	 * and we should exit.
+	 * Attempt to lock the file. If this fails, there is already a process using the same lock
+	 * file running and we should exit.
+	 *
+	 * lock_file is chosen based on the build configuration, see implementations above.
 	 */
-	lock.l_whence = SEEK_SET;
-	lock.l_type = F_WRLCK;
-
-	ret = fcntl(fd, F_SETLK, &lock);
+	ret = lock_file(filepath, fd);
 	if (ret == -1) {
-		PERROR("fcntl lock file");
-		ERR("Could not get lock file %s, another instance is running.", filepath);
+		ERR("Could not get lock file `%s`, another instance is running.", filepath);
+
 		if (close(fd)) {
-			PERROR("close lock file");
+			PERROR("Failed to close lock file fd: fd=%d", fd);
 		}
+
 		fd = ret;
 		goto error;
 	}
 
+	DBG_FMT("Acquired lock file: file_path={}", filepath);
+
 error:
 	return fd;
-}
-
-#endif /* HAVE_FLOCK */
\ No newline at end of file
+}
\ No newline at end of file
-- 
2.34.1