Actions
Bug #549
closed
TL
lttng2.2.0rc2: Session can not complete deactivating process due to return value 1 of lttng_data_pending()
Bug #549:
lttng2.2.0rc2: Session can not complete deactivating process due to return value 1 of lttng_data_pending()
Start date:
05/30/2013
Due date:
% Done:
0%
Estimated time:
Description
Commit used:
============
babeltrace : 9eaf254 Version 1.0.3
tools : 094d169 (HEAD, origin/master, origin/HEAD) Fix: dereference after NULL check
ust : 996aead (HEAD, origin/master, origin/HEAD) Add parameter -f to rm in Makefile clean target
userspace : 264716f (HEAD, origin/stable-0.7, stable-0.7) Fix: Use a filled signal mask to disable all signals
Problem Description:
====================
* During stability test, sometimes the deactivating process got hung when consumerD
keeps reporting lttng_data_pending() with value 1 even if the session is already
inactive (seen via "lttng list").
When our code deactivate a session, the following sequence of API are used:
lttng_stop_tracing_no_wait()
lttng_data_pending()
If return value > 0, repeat every 100ms until return value == 0 (ie: no more data)
From our log, the session has about 6MB of data. The deactivation has taken more than
5hrs and lttng_data_pending still returns 1.
"lttng list" shows that the session is already inactive.
Periodically check the size of the session dir, no further data being written into
that dir for a long time.
"kill -SIGABRT" is used to kill the consumerD. The corresponding gdb printout is
attached with this report. Unfortunately, we did not manage to use "kill -SIGABRT"
on sessionD as it was automatically killed by our health check process once it
detected that consumerD was no longer healthy.
This is the first time we observe this behaviour in lttng2.2.0rc2 .
We have seen this a couple of time in lttng2.1 . Note in lttng2.1, we used to
see the deactivation process last more than 2 days (never got completed).
Is problem reproducible ?
=========================
* Maybe
How to reproduce (if reproducible):
===================================
* Our stability test consist of 4 users. Each user has a different set of trace commands
(such as create session, activate session, stop session, etc). Each user then executes
its set of commands through multiple iterations.
All sessions are created using streamming and perUID buffer and tracing on userspace only.
After the overnight run, we start seing one session from one node could not complete the
deactivation process.
Any other information:
======================
-
Files
Actions