Fix: relayd: live: data is missing between viewer attach and retry
authorJonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Thu, 6 May 2021 15:14:52 +0000 (11:14 -0400)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Fri, 7 May 2021 17:37:57 +0000 (13:37 -0400)
Observed issue
==============

Data produced between the time a live viewer attach and a viewer
subsequent "polling" is missing from the point of view of the viewer.

Using the following reproducer:

  # A lttng-sessiond is already running
  lttng-relayd -b -vvv > relayd.log 2>&1
  lttng create live --live
  lttng enable-event -u "hello_world:*"
  lttng start

  date
  # Set the "polling" interval at ~30 seconds
  babeltrace2 --retry-duration=30000000 -i lttng-live net://localhost/host/$HOSTNAME/live &
  pid=$!

  # Make sure we are in the retry phase of bt2
  date
  sleep 2

  # Produce events
  # Note here that hello perform a sleep of 1 seconds before generating
  # 10 events to make sure lttng-ust had time to register. A total of
  # 100 events is generated.
  date
  for i in `seq 1 10`; do
   ./hello
  done

  # Here we expect to see 100 events outputted at some point (~18
  # seconds (30 - 2 - 10) from that time).
  # No events are received by babeltrace2.
  date
  sleep 40
  date

  # We retry the same exercise a second time.
  echo "Moving to phase 2"
  date
  for i in `seq 1 10`; do
   ./hello
  done

  # Here data is received correctly.
  date
  sleep 40
  date

  kill $pid
  wait

  lttng destroy live
  pkill lttng-relayd

Cause
=====

At the moment the viewer attach is done, no streams exist. On the
following viewer_get_new_streams (~30 seconds later), streams now exist
since they were allocated lazily following the first application
registration, but LTTNG_VIEWER_SEEK_LAST is used. This essentially
"discards" any events in the stream that happened between the attach
time and the viewer_get_new_stream execution.

Note that the same problem would happen if a new UID (in per-uid mode)
would happen to begin tracing in between viewer retry.

Solution
========

Always use `LTTNG_VIEWER_SEEK_BEGINNING` during
`viewer_get_new_streams`.

Known drawbacks
=========

This does not fix the per-pid problem where an app run and die in between
the viewer retries.

References
==========

https://lists.lttng.org/pipermail/lttng-dev/2021-May/029953.html

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I92fbe6511fb93e31d20922b4eef5c9a221e07716

src/bin/lttng-relayd/live.c

index 5877467c0d5598f0cbdc7481f08c144a0b7580b1..e884cdc8a03cc18a1ab06a5570a730fbc9f9144a 100644 (file)
@@ -1142,10 +1142,19 @@ int viewer_get_new_streams(struct relay_connection *conn)
                goto send_reply;
        }
 
+       /*
+        * For any new stream, create it with LTTNG_VIEWER_SEEK_BEGINNING since
+        * that at this point the client is already attached to the session.Aany
+        * initial stream will have been created with the seek type at attach
+        * time (for now most readers use the LTTNG_VIEWER_SEEK_LAST on attach).
+        * Otherwise any event happening in a new stream between the attach and
+        * a call to viewer_get_new_streams will be "lost" (never received) from
+        * the viewer's point of view.
+        */
        pthread_mutex_lock(&session->lock);
        ret = make_viewer_streams(session,
                        conn->viewer_session,
-                       LTTNG_VIEWER_SEEK_LAST, &nb_total, &nb_unsent,
+                       LTTNG_VIEWER_SEEK_BEGINNING, &nb_total, &nb_unsent,
                        &nb_created, &closed);
        if (ret < 0) {
                goto error_unlock_session;
This page took 0.026796 seconds and 4 git commands to generate.