Project

General

Profile

Actions

Bug #1406

closed

Missing events/crash with babeltrace2 lttng-live plugin with session that uses per-pid buffers

Added by Kienan Stewart 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
12/13/2023
Due date:
% Done:

100%

Estimated time:

Description

While investigating #1403 I noticed one pattern of test failures that had babeltrace2 crashes or hangs. The problem seemed intermittent, but I have been able to develop a test case that's reproducible for me. This affects for babeltrace2 stable-2.0 and master while using lttng-tools master.

The test case (see attached script) performs the following steps:

  1. Start a ust application and leave it running
  2. Configure and then start an lttng live session
  3. Connect a live viewer (babeltrace)
  4. Run a second ust application
  5. Wait for the expected number of events
    1. In the failing case, no events are seen by babeltrace

Using per-uid buffers, the test typically completes normally. With per-pid buffers the test fails, hanging indefinitely if waiting for the specified number of events. While "hanging", babeltrace2 is polling the relayd:

clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, NULL) = 0
sendto(3, "\0\0\0\0\0\0\0\10\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\6", 24, MSG_NOSIGNAL, NULL, 0) = 24
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64, 0, NULL, NULL) = 64
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, NULL) = 0
sendto(3, "\0\0\0\0\0\0\0\10\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\6", 24, MSG_NOSIGNAL, NULL, 0) = 24
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64, 0, NULL, NULL) = 64

While the test as written stops the session before the first gen-ust-events app when the timeout is reached, if the test is invoked with LIVEVIEWER_TIMEOUT=NO we can try to different operations:

  • killing the early gen-ust-events app causes babeltrace2 to exit unsuccessfully (I don't think this should happen)
  • killing the late gen-ust-events app leaves babeltrace2 running
  • stopping the session (as in the test), causes babeltrace2 to exit unsuccessfully (I think this should exit gracefully, but I could be wrong)

Files

bt.err.bz2 (59.2 KB) bt.err.bz2 Kienan Stewart, 12/13/2023 11:42 AM
relayd.log.bz2 (1.45 MB) relayd.log.bz2 Kienan Stewart, 12/13/2023 11:42 AM
sessiond.log (242 KB) sessiond.log Kienan Stewart, 12/13/2023 11:42 AM
test_delay_hang_bt2 (3.53 KB) test_delay_hang_bt2 Kienan Stewart, 12/13/2023 11:42 AM
relayd.log (284 KB) relayd.log relayd.log for comment 2 Kienan Stewart, 12/14/2023 03:26 PM

Related issues 1 (0 open1 closed)

Related to LTTng-tools - Bug #1403: Investigate why events are no longer recorded by the live view when `DELAYUS` in `tests/regression/tools/clear/test_ust` is reduced too lowResolvedKienan Stewart12/05/2023

Actions
Actions #1

Updated by Kienan Stewart 5 months ago

  • Related to Bug #1403: Investigate why events are no longer recorded by the live view when `DELAYUS` in `tests/regression/tools/clear/test_ust` is reduced too low added
Actions #2

Updated by Kienan Stewart 5 months ago · Edited

The per-pid buffer portion of the test also fails with lttng-tools stable-2.13.

I believe I understand the mechanism of failure, and I think changes to both lttng-tools (relayd) and babeltrace can address the issue.

The first application is running and registered with the sessiond and relayd, and it's events have all be emitted. This application continues to run and remained registered, waiting for the before exit file to be touched. The live viewer then connects and has the first set of streams from the early app registered. It begins to regularly poll the relayd with VIEWER_GET_NEXT_INDEX commands1.

When the second UST application runs and registers, the sessiond notifies the relayd (RELAYD_ADD_STREAM). Once all streams are added, RELAYD_STREAMS_SENT is sent by the sessiond to the relayd. RELAYD_STREAMS_SENT will cause session->new_stream to be set for the relevant session.

The new streams are registered with the relayd:

DBG1 - 14:29:10.865071340 [2589952/2589964]: Done receiving control command header: fd = 19, cmd = RELAYD_STREAMS_SENT, cmd_version = 0, payload size = 0 bytes (in relay_pro
cess_control_receive_header() at main.cpp:3508)
DBG1 - 14:29:10.865143952 [2589952/2589964]: Done receiving control command payload: fd = 19, payload size = 0 bytes (in relay_process_control_receive_payload() at main.cpp:
3423)
DBG3 - 14:29:10.865201980 [2589952/2589964]: Processing "RELAYD_STREAMS_SENT" command for socket 19 (in relay_process_control_command() at main.cpp:3299)

This last variable is checked in live.cpp::check_new_streams[2]. If set, the results index flags should have the bit for LTTNG_VIEWER_FLAG_NEW_STREAM set to indicate that the viewer should run LTTNG_VIEWER_GET_NEW_STREAMS to update it's streams. However, the next LTTNG_VIEWER_GET_NEW_STREAMS_COMMAND doesn't happen.

New streams are only fetched once:

$ grep GET_NEW_STREAMS /tmp/relayd.log
DBG1 - 14:29:13.153878107 [2589952/2589967]: Processing GET_NEW_STREAMS viewer command from connection 31 (in process_control() at live.cpp:2522)

As there are no new events causing changes to the indices of streams (data + metadata streams from the the early UST application), the GET_NEXT_INDEX command is returning early3 with the flags set by check_index_status[4] with no verification at all of the session->new_stream status.

The next GET_NEXT_INDEX commands after the new streams are registered with the relayd:

DBG1 - 14:29:10.896364295 [2589952/2589967]: Processing GET_NEXT_INDEX viewer command from connection 31 (in process_control() at live.cpp:2522)
DBG1 - 14:29:10.896469038 [2589952/2589967]: Transition to latest chunk check (0 -> 0): Same chunk, no need to rotate (in viewer_get_next_index() at live.cpp:1785)
DBG1 - 14:29:10.896504900 [2589952/2589967]: Check index status: index_received_seqcount 1 index_sent_seqcount 1 for stream 1 (in check_index_status() at live.cpp:1499)
DBG1 - 14:29:10.896537507 [2589952/2589967]: Check index status: inactive with beacon, for stream 1, returning status=INDEX_INACTIVE (in check_index_status() at live.cpp:1542)
2)

As a result, the live viewer won't be able to start the process to register new streams until there are changes on the first set of registered streams. In this test case, there are no further events from the early UST application.

However, it is not sufficient to ensure that the LTTNG_VIEWER_FLAG_NEW_STREAM is sent when the relayd replies with LTTNG_VIEWER_INDEX_INACTIVE. When the command's return status is LTTNG_VIEWER_INDEX_INACTIVE, the index flag LTTNG_VIEWER_FLAG_NEW_STREAM is never verified5. The new stream and new metadata flags are only being checked on the babeltrace2 side when command's return code is LTTNG_VIEWER_INDEX_OK[6].

To resolve this I would propose two changes:

  1. In the relayd, ensure that the index flags have LTTNG_VIEWER_FLAG_NEW_STREAM have checked, set, and returned on all the non-error code paths.
  2. In the babeltrace2 live viewer, check the same flags on all non-error code paths so the subsequent commands can be issued to update the streams.

I believe similar issues also apply to LTTNG_VIEWER_FLAG_NEW_METADATA.

[1]: https://github.com/lttng/lttng-tools/blob/7d46777ba0eaba7916ae822a7133f39740cdf9e5/doc/live-reading-protocol.txt#L120
[2]: https://github.com/lttng/lttng-tools/blob/7d46777ba0eaba7916ae822a7133f39740cdf9e5/src/bin/lttng-relayd/live.cpp#L1902
[3]: https://github.com/lttng/lttng-tools/blob/7d46777ba0eaba7916ae822a7133f39740cdf9e5/src/bin/lttng-relayd/live.cpp#L1822
[4]: https://github.com/lttng/lttng-tools/blob/7d46777ba0eaba7916ae822a7133f39740cdf9e5/src/bin/lttng-relayd/live.cpp#L1562
[5]: https://github.com/efficios/babeltrace/blob/fca1d0f55bf24741f07d2a86d70ad7274d3ec851/src/plugins/ctf/lttng-live/viewer-connection.cpp#L1463
[6]: https://github.com/efficios/babeltrace/blob/fca1d0f55bf24741f07d2a86d70ad7274d3ec851/src/plugins/ctf/lttng-live/viewer-connection.cpp#L1448

Actions #3

Updated by Kienan Stewart 5 months ago

Actions #4

Updated by Kienan Stewart 5 months ago

  • Status changed from New to In Progress
  • Assignee set to Kienan Stewart

For a potential patch check LTTNG_VIEWER_FLAG_NEW_STREAM (and logically _NEW_METADATA) for the INACTIVE and HUP cases. Eg. check_new_streams before check_index_status, and have check_index_status not fully override the index flags (eg. use |=).

Actions #5

Updated by Mathieu Desnoyers 5 months ago

Kienan Stewart wrote in #note-4:

For a potential patch check LTTNG_VIEWER_FLAG_NEW_STREAM (and logically _NEW_METADATA) for the INACTIVE and HUP cases. Eg. check_new_streams before check_index_status, and have check_index_status not fully override the index flags (eg. use |=).

Sorry I mislead you in our IRC discussion: moving check_new_streams before check_index_status is fine, but we don't need to change anything in check_index_status because it only modifies "index->status" and not "index->flags".

Actions #6

Updated by Jérémie Galarneau 5 months ago

First thanks for the write-up, it's awesome!

As a result, the live viewer won't be able to start the process to register new streams until there are changes on the first set of registered streams. In this test case, there are no further events from the early UST application.

This part surprises me. I would expect the live beacon (effectively a new index entry sent from the consumer to the relay daemon when the live timer fires) to cause "new activity" to be perceived on the streams of the original application.

Have you noticed something that prevents the live beacon from doing its job?

Actions #7

Updated by Jérémie Galarneau 5 months ago · Edited

Just talked with Mathieu and I hadn't realized the live beacons resulted in the INACTIVE status being returned. Your explanation makes a lot of sense.

Actions #8

Updated by Jérémie Galarneau 5 months ago

I took a shot at implementing a fix against both LTTng-tools and Babeltrace2 master branches:

https://review.lttng.org/c/lttng-tools/+/11588
https://review.lttng.org/c/babeltrace/+/11587

They are not tested yet, I'm setting up a reproducer to confirm on my end.

Let me know if you get a chance to test it on your end!

Actions #9

Updated by Kienan Stewart 5 months ago · Edited

I also have proposed fixes from last week here: https://review.lttng.org/q/topic:%22bugs1406-inactive_indices_miss_new_streams%22 - I just needed to finish the write up + per-uid tests

Actions #10

Updated by Jérémie Galarneau 5 months ago

Okay great! Sorry, I didn't think of looking on gerrit.
They look very close functionality-wise so I don't mind abandoning my changes. I'll comment on yours.

Actions #11

Updated by Kienan Stewart 5 months ago

My bad for not leaving a comment here on Friday to say where I was at. I could also abandon my changes and bring in the tests to your patch on live. Whichever you prefer.

What was your motivation for adding the SAMPLE_AND_CLEAR and SAMPLE modes to the check_new_streams function?

Actions #12

Updated by Jérémie Galarneau 5 months ago

Kienan Stewart wrote in #note-11:

My bad for not leaving a comment here on Friday to say where I was at. I could also abandon my changes and bring in the tests to your patch on live. Whichever you prefer.

I'm good going with yours, you did the heavy lifting of finding the cause 🤘

What was your motivation for adding the SAMPLE_AND_CLEAR and SAMPLE modes to the check_new_streams function?

It's an attempt to maintain compatibility with unfixed viewers as much as possible.

When check_new_streams is invoked, it searches for a session with new streams. If one is found, it clears the flag by using a cmpxchg and exits with 1. In that sense, the check is "destructive".

Since unfixed viewers don't check the flags for new streams when the index reply uses the [...]_INACTIVE status, we would clear the session's flag (on the relay daemon end) and they would simply ignore it. Then, the viewer would never be informed that this session has new streams even if the existing streams start to see activity again (causing the index queries to return "active" indexes).

The two sampling modes are introduced to have destructive and non-destructive versions of the check. When an "active" index reply is delivered, we use the destructive mode as we always did.

When the index reply is "inactive", we use the non-destructive version so that:
  • We give the "hint" that new streams are available to "fixed" viewers so that they fetch the new streams and resume consumption even though the current streams are "inactive",
  • We allow unfixed viewers a chance to see the [...]_NEW_STREAMS flag whenever the index query happens to succeed at delivering a proper "active" index.

Does that make sense?

Actions #13

Updated by Kienan Stewart 5 months ago

Jérémie Galarneau wrote in #note-12:

It's an attempt to maintain compatibility with unfixed viewers as much as possible.

When check_new_streams is invoked, it searches for a session with new streams. If one is found, it clears the flag by using a cmpxchg and exits with 1. In that sense, the check is "destructive".

Since unfixed viewers don't check the flags for new streams when the index reply uses the [...]_INACTIVE status, we would clear the session's flag (on the relay daemon end) and they would simply ignore it. Then, the viewer would never be informed that this session has new streams even if the existing streams start to see activity again (causing the index queries to return "active" indexes).

The two sampling modes are introduced to have destructive and non-destructive versions of the check. When an "active" index reply is delivered, we use the destructive mode as we always did.

When the index reply is "inactive", we use the non-destructive version so that:
  • We give the "hint" that new streams are available to "fixed" viewers so that they fetch the new streams and resume consumption even though the current streams are "inactive",
  • We allow unfixed viewers a chance to see the [...]_NEW_STREAMS flag whenever the index query happens to succeed at delivering a proper "active" index.

Does that make sense?

I hadn't realized that check_new_streams was a destructive compxchg. Would it make sense to replace the compxchg with the non-destructive sampling version only, and then in viewer_get_new_streams have a corresponding compxchg or similar to reset the value back to 0?

Based on my new understanding the current implementation would only send a packet with the viewer_flags LTTNG_VIEWER_FLAG_NEW_STREAM once and assume that the request will be (at some point) handled. If the expectation of the protocol is that new streams have to be fetched with the LTTNG_VIEWER_GET_NEW_STREAMS command, wouldn't it be easier to assume that the request isn't actually treated until that command happens? While it would mean that a client could potentially receive multiple packets with the LTTNG_VIEWER_FLAG_NEW_STREAM in a row if they don't immediately treat the update to the session state, I think it would muddy less the expectations around check_new_streams and be clear for clients that the session state update is still pending. I don't think this change would impact how babeltrace2 operates.

From https://github.com/lttng/lttng-tools/blob/master/doc/live-reading-protocol.txt#L121 there doesn't seem to be any indication that the flag LTTNG_VIEWER_FLAG_NEW_STREAM would only be set once.

Actions #14

Updated by Jérémie Galarneau 5 months ago · Edited

Kienan Stewart wrote in #note-13:

I hadn't realized that check_new_streams was a destructive compxchg. Would it make sense to replace the compxchg with the non-destructive sampling version only, and then in viewer_get_new_streams have a corresponding compxchg or similar to reset the value back to 0?

Yes, it's one of the alternatives I had noted on Friday. I had not pursued it because of the locking context in which all of this happens.

The trace receiving end (an event loop that handles incoming trace data from various consumer daemons) and the live protocol event loop (which interacts with viewers) are in concurrent threads that share some data locklessly; the relay_session (and the objects it owns) in this case.

That's why an atomic cmpxchg is used to clear the new_session flag.

To implement the clear as part of the get_new_streams command, we have to be mindful of the inherent "race" between:
  • The publication of new streams (from the receiving end)
  • The clearing of the flag (from the viewer end).
In essence, the "dangerous" situation we want to avoid is:
  1. The receiving end sets the new_stream flag while processing relay_streams_sent
  2. The live end starts to send new streams in response to a get_new_streams command
  3. New streams are published on the receiving end, setting the new_stream flag
  4. The new_stream flag is cleared by the viewer end

In such a scenario, some streams would never be announced to the viewer.
Just generally, we must guarantee that new_stream is set if some streams could have not been seen by the viewer.

Looking at the current implementation, we set the new_stream flag in two places:

To be honest, I'm not quite sure of the reasoning behind the second assignation. I need to look at it more carefully.

Regardless, in both cases we set the flag after making the streams visible (publishing) to the trace's stream_list.

On the live end, we need to ensure we clear the new_streams flag before iterating on the stream list.

I need to think about the appropriate synchronization (memory barriers) to put in place considering those already implied by the uses of the liburcu API 🤔.

Actions #16

Updated by Jérémie Galarneau about 2 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF