Fix: consumerd: fd still open after `lttng snapshot record` returns
authorJonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Wed, 9 Feb 2022 19:23:18 +0000 (14:23 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Thu, 3 Mar 2022 16:49:51 +0000 (11:49 -0500)
commit030a478aaea576cce1f772652310045e13b030d9
treefc23f7e2fc1cbb8625974958d4c71f854cecb97a
parent38128370a845ede158a9e8793f6836cb220240af
Fix: consumerd: fd still open after `lttng snapshot record` returns

Observed issue
=====

Using a snapshot output located on a pramfs mount:

  lttng snapshot record
  rm -rf /my_mount/my_trace_output

`rm` fails with ENOTEMPTY on rmdir for /my_mount/my_trace_output.

At that point, the lttng-consumerd daemon have an open fd on:
  /my_mount/my_trace_output/ust

Note that a sleep between both command "fixes" the issue.

Cause
=====

The reclaim for the in-registry trace chunks can happen after the LTTng
CLI returns since we use `call_rcu`.

```
static
void lttng_trace_chunk_release(struct urcu_ref *ref)

....

  if (chunk->in_registry_element) {
    struct lttng_trace_chunk_registry_element *element;

    element = container_of(chunk, typeof(*element), chunk);
    if (element->registry) {
      rcu_read_lock();
      cds_lfht_del(element->registry->ht, &element->trace_chunk_registry_ht_node);
      rcu_read_unlock();
->    call_rcu(&element->rcu_node, free_lttng_trace_chunk_registry_element);
   } else {

```

The delayed reclaim of the `lttng_trace_chunk_registry_element` can
result in lttng-consumerd holding an open fd for the "chunk directory"
of the chunk since the close() is only done during the "*fini" phase of
the chunk (`lttng_trace_chunk_fini`).

Solution
========

Considering that the rcu lookup+refcount access scheme is used for the
trace chunk object and that at that point the refcount for the trace
chunk object is effectively zero, we can move the
`lttng_trace_chunk_fini` safely outside of the
`free_lttng_trace_chunk_registry_element` call_rcu call.

Known drawbacks
=========

Even if this solves the current situation, it is important to note that
the actual object holding the reference is itself refcounted and only
close the fd on release. This means that we are still exposed to this
problem if at some point the directory handle is shared and outlives the
trace chunk for some reason in the future.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6da3948824bf8b092fc8248b1bb0263fdd5887be
src/common/trace-chunk.c
This page took 0.025757 seconds and 4 git commands to generate.