]> git.lttng.org Git - lttng-tools.git/commitdiff
Fix: Close per-process event notifier error accounting fds on registration
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mon, 4 Nov 2024 18:26:50 +0000 (13:26 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Mon, 25 Nov 2024 20:22:25 +0000 (20:22 +0000)
On application registration, the event notifier error accounting file
descriptors are duplicated to send the error accounting counter objects
to the application.

Those are left open until the application unregisters.

There is one file descriptor per CPU, so on larger systems (228 CPUs
Intel or 192 CPUs AMD EPYC), this adds up to a lot of file descriptors
when the number of registered applications is large, which can result in
file descriptor exhaustion errors.

Moreover, the application unregistration is done from delete_ust_app(),
which is used from a call_rcu() worker thread, thus after an RCU grace
period delay. This means that a steady stream of short-lived
applications with a short enough lifetime could end up allocating more
file descriptors than can be closed.

Fix this by closing those file descriptors immediately after the objects
are sent to the application, similarly to what is done for the ring
buffer streams.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia1bbc3ff09a20f37d069ade7e267fb043ea1ac7f

src/bin/lttng-sessiond/event-notifier-error-accounting.cpp

index 512440c3bfa11fb435581ee85ced995d9055ec55..34f10f0a5f43939d0fc2b3692dc2fda2f67f7b8a 100644 (file)
@@ -690,9 +690,12 @@ event_notifier_error_accounting_register_app(struct ust_app *app)
                            (int) app->pid,
                            app->name);
                        status = EVENT_NOTIFIER_ERROR_ACCOUNTING_STATUS_ERR;
+                       lttng_ust_ctl_release_object(-1, new_counter_cpu);
                        goto error_send_cpu_counter_data;
                }
+               lttng_ust_ctl_release_object(-1, new_counter_cpu);
        }
+       lttng_ust_ctl_release_object(-1, new_counter);
 
        app->event_notifier_group.counter = new_counter;
        new_counter = nullptr;
@@ -712,8 +715,6 @@ error_duplicate_cpu_counter:
                         */
                        break;
                }
-
-               lttng_ust_ctl_release_object(-1, cpu_counters[i]);
                free(cpu_counters[i]);
        }
 
This page took 0.029912 seconds and 4 git commands to generate.