Project

General

Profile

Actions

Bug #1406

closed

Missing events/crash with babeltrace2 lttng-live plugin with session that uses per-pid buffers

Added by Kienan Stewart 5 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
12/13/2023
Due date:
% Done:

100%

Estimated time:

Description

While investigating #1403 I noticed one pattern of test failures that had babeltrace2 crashes or hangs. The problem seemed intermittent, but I have been able to develop a test case that's reproducible for me. This affects for babeltrace2 stable-2.0 and master while using lttng-tools master.

The test case (see attached script) performs the following steps:

  1. Start a ust application and leave it running
  2. Configure and then start an lttng live session
  3. Connect a live viewer (babeltrace)
  4. Run a second ust application
  5. Wait for the expected number of events
    1. In the failing case, no events are seen by babeltrace

Using per-uid buffers, the test typically completes normally. With per-pid buffers the test fails, hanging indefinitely if waiting for the specified number of events. While "hanging", babeltrace2 is polling the relayd:

clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, NULL) = 0
sendto(3, "\0\0\0\0\0\0\0\10\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\6", 24, MSG_NOSIGNAL, NULL, 0) = 24
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64, 0, NULL, NULL) = 64
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=100000000}, NULL) = 0
sendto(3, "\0\0\0\0\0\0\0\10\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\6", 24, MSG_NOSIGNAL, NULL, 0) = 24
recvfrom(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64, 0, NULL, NULL) = 64

While the test as written stops the session before the first gen-ust-events app when the timeout is reached, if the test is invoked with LIVEVIEWER_TIMEOUT=NO we can try to different operations:

  • killing the early gen-ust-events app causes babeltrace2 to exit unsuccessfully (I don't think this should happen)
  • killing the late gen-ust-events app leaves babeltrace2 running
  • stopping the session (as in the test), causes babeltrace2 to exit unsuccessfully (I think this should exit gracefully, but I could be wrong)

Files

bt.err.bz2 (59.2 KB) bt.err.bz2 Kienan Stewart, 12/13/2023 11:42 AM
relayd.log.bz2 (1.45 MB) relayd.log.bz2 Kienan Stewart, 12/13/2023 11:42 AM
sessiond.log (242 KB) sessiond.log Kienan Stewart, 12/13/2023 11:42 AM
test_delay_hang_bt2 (3.53 KB) test_delay_hang_bt2 Kienan Stewart, 12/13/2023 11:42 AM
relayd.log (284 KB) relayd.log relayd.log for comment 2 Kienan Stewart, 12/14/2023 03:26 PM

Related issues 1 (0 open1 closed)

Related to LTTng-tools - Bug #1403: Investigate why events are no longer recorded by the live view when `DELAYUS` in `tests/regression/tools/clear/test_ust` is reduced too lowResolvedKienan Stewart12/05/2023

Actions
Actions

Also available in: Atom PDF