Project

General

Profile

Actions

Bug #1421

open

lttng session daemon hanging

Added by Mikael Beckius 7 days ago. Updated 1 day ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
03/05/2025
Due date:
% Done:

0%

Estimated time:

Description

This problem was discovered while doing live streaming in a network with high latency and it appears that resources aren't released properly when a session is destroyed. It should be possible to reproduce using the following steps:

/usr/lib64/lttng-tools/ptest/tests/utils/testapp/gen-ust-events/gen-ust-events -i 1000000 -w 1000000 &
groupadd tracing
useradd -d /home/lttnguser -m -s /bin/sh -g tracing lttnguser
su lttnguser
/usr/lib64/lttng-tools/ptest/tests/utils/testapp/gen-ust-events/gen-ust-events -i 1000000 -w 1000000 &

lttng create micke --live=2000000
lttng enable-channel s micke -u --num-subbuf=4 --subbuf-size=1048576 ch1
lttng start micke
tc qdisc add dev lo root netem delay 300ms
-
> Monitor signal blocked by live signal
lttng destroy micke
Starting the session daemon in verbose mode will make it clear that the session is still alive. Otherwise, it will be noticed when trying to create a new session and the lttng commands appear to hang.

Removing the delay will make the monitor signal unblocked. To work around this the attached patch was created where the priorities are switched between live and monitor.

#######################################

While investigating this issue another issue that is somewhat similar was also encountered. This is currently not a problem, but I thought might be worth mentioning anyway. It involves two live sessions and should be possible to reproduce using the following steps:
lttng create micke --live=2000000
lttng enable-channel -s micke -u --num-subbuf=4 --subbuf-size=1048576 ch1
lttng start micke
lttng create micke2--live=2000000
lttng enable-channel -s micke2-u --num-subbuf=4 --subbuf-size=1048576 ch1
lttng start micke2

tc qdisc add dev lo root netem delay 300ms
lttng destroy micke2
--> live timer destruction blocked, pending, second live

Removing delay will make destruction unblocked
Removing booth sessions will also make destruction unblocked

It appears that in this case timer destruction can't proceed while there are outstanding signals connected to the timer in question.

#######################################

In both cases a 150ms delay is usually sufficient.

Problems were discovered on version 2.13.13.


Files

Actions #1

Updated by Kienan Stewart 6 days ago

  • Project changed from LTTng to LTTng-tools
Actions #2

Updated by Kienan Stewart 1 day ago

Hi Micke,

thanks for the bug report. In the first case you demonstrate: when you say hang, do you mean an indefinite hang? I haven't been able to reproduce that with 2.13.13, stable-2.13, or master.

thanks,
kienan

Actions

Also available in: Atom PDF