Bug #1421
openlttng session daemon hanging
0%
Description
This problem was discovered while doing live streaming in a network with high latency and it appears that resources aren't released properly when a session is destroyed. It should be possible to reproduce using the following steps:
/usr/lib64/lttng-tools/ptest/tests/utils/testapp/gen-ust-events/gen-ust-events -i 1000000 -w 1000000 &
groupadd tracing
useradd -d /home/lttnguser -m -s /bin/sh -g tracing lttnguser
su lttnguser
/usr/lib64/lttng-tools/ptest/tests/utils/testapp/gen-ust-events/gen-ust-events -i 1000000 -w 1000000 &
lttng create micke --live=2000000
lttng enable-channel s micke -u --num-subbuf=4 --subbuf-size=1048576 ch1> Monitor signal blocked by live signal
lttng start micke
tc qdisc add dev lo root netem delay 300ms
-
lttng destroy micke
Starting the session daemon in verbose mode will make it clear that the session is still alive. Otherwise, it will be noticed when trying to create a new session and the lttng commands appear to hang.
Removing the delay will make the monitor signal unblocked. To work around this the attached patch was created where the priorities are switched between live and monitor.
#######################################
While investigating this issue another issue that is somewhat similar was also encountered. This is currently not a problem, but I thought might be worth mentioning anyway. It involves two live sessions and should be possible to reproduce using the following steps:
lttng create micke --live=2000000
lttng enable-channel -s micke -u --num-subbuf=4 --subbuf-size=1048576 ch1
lttng start micke
lttng create micke2--live=2000000
lttng enable-channel -s micke2-u --num-subbuf=4 --subbuf-size=1048576 ch1
lttng start micke2
tc qdisc add dev lo root netem delay 300ms
lttng destroy micke2
--> live timer destruction blocked, pending, second live
Removing delay will make destruction unblocked
Removing booth sessions will also make destruction unblocked
It appears that in this case timer destruction can't proceed while there are outstanding signals connected to the timer in question.
#######################################
In both cases a 150ms delay is usually sufficient.
Problems were discovered on version 2.13.13.
Files
Updated by Kienan Stewart 1 day ago
Hi Micke,
thanks for the bug report. In the first case you demonstrate: when you say hang, do you mean an indefinite hang? I haven't been able to reproduce that with 2.13.13, stable-2.13, or master.
thanks,
kienan