From: Jonathan Rajotte Date: Wed, 9 Feb 2022 19:23:18 +0000 (-0500) Subject: Fix: consumerd: fd still open after `lttng snapshot record` returns X-Git-Url: https://git.lttng.org/?p=lttng-tools.git;a=commitdiff_plain;h=c26d615c0c49dd994a687c8e1afe670809100765 Fix: consumerd: fd still open after `lttng snapshot record` returns Observed issue ===== Using a snapshot output located on a pramfs mount: lttng snapshot record rm -rf /my_mount/my_trace_output `rm` fails with ENOTEMPTY on rmdir for /my_mount/my_trace_output. At that point, the lttng-consumerd daemon have an open fd on: /my_mount/my_trace_output/ust Note that a sleep between both command "fixes" the issue. Cause ===== The reclaim for the in-registry trace chunks can happen after the LTTng CLI returns since we use `call_rcu`. ``` static void lttng_trace_chunk_release(struct urcu_ref *ref) .... if (chunk->in_registry_element) { struct lttng_trace_chunk_registry_element *element; element = container_of(chunk, typeof(*element), chunk); if (element->registry) { rcu_read_lock(); cds_lfht_del(element->registry->ht, &element->trace_chunk_registry_ht_node); rcu_read_unlock(); -> call_rcu(&element->rcu_node, free_lttng_trace_chunk_registry_element); } else { ``` The delayed reclaim of the `lttng_trace_chunk_registry_element` can result in lttng-consumerd holding an open fd for the "chunk directory" of the chunk since the close() is only done during the "*fini" phase of the chunk (`lttng_trace_chunk_fini`). Solution ======== Considering that the rcu lookup+refcount access scheme is used for the trace chunk object and that at that point the refcount for the trace chunk object is effectively zero, we can move the `lttng_trace_chunk_fini` safely outside of the `free_lttng_trace_chunk_registry_element` call_rcu call. Known drawbacks ========= Even if this solves the current situation, it is important to note that the actual object holding the reference is itself refcounted and only close the fd on release. This means that we are still exposed to this problem if at some point the directory handle is shared and outlives the trace chunk for some reason in the future. Signed-off-by: Jonathan Rajotte Signed-off-by: Jérémie Galarneau Change-Id: I6da3948824bf8b092fc8248b1bb0263fdd5887be --- diff --git a/src/common/trace-chunk.cpp b/src/common/trace-chunk.cpp index ad7a983fb..9e804ce01 100644 --- a/src/common/trace-chunk.cpp +++ b/src/common/trace-chunk.cpp @@ -1857,7 +1857,6 @@ void free_lttng_trace_chunk_registry_element(struct rcu_head *node) struct lttng_trace_chunk_registry_element *element = container_of(node, typeof(*element), rcu_node); - lttng_trace_chunk_fini(&element->chunk); free(element); } @@ -1879,6 +1878,24 @@ void lttng_trace_chunk_release(struct urcu_ref *ref) if (chunk->in_registry_element) { struct lttng_trace_chunk_registry_element *element; + /* + * Release internal chunk attributes immediately and + * only use the deferred `call_rcu` work to reclaim the + * storage. + * + * This ensures that file handles are released as soon as + * possible which works around a problem we encounter with PRAM fs + * mounts (and possibly other non-POSIX compliant file systems): + * directories that contain files which are open can't be + * rmdir(). + * + * This means that the recording of a snapshot could be + * completed, but that it would be impossible for the user to + * delete it until the deferred clean-up released the file + * handles to its contents. + */ + lttng_trace_chunk_fini(chunk); + element = container_of(chunk, typeof(*element), chunk); if (element->registry) { rcu_read_lock();