Actions
Bug #549
closedlttng2.2.0rc2: Session can not complete deactivating process due to return value 1 of lttng_data_pending()
Start date:
05/30/2013
Due date:
% Done:
0%
Estimated time:
Description
Commit used: ============ babeltrace : 9eaf254 Version 1.0.3 tools : 094d169 (HEAD, origin/master, origin/HEAD) Fix: dereference after NULL check ust : 996aead (HEAD, origin/master, origin/HEAD) Add parameter -f to rm in Makefile clean target userspace : 264716f (HEAD, origin/stable-0.7, stable-0.7) Fix: Use a filled signal mask to disable all signals Problem Description: ==================== * During stability test, sometimes the deactivating process got hung when consumerD keeps reporting lttng_data_pending() with value 1 even if the session is already inactive (seen via "lttng list"). When our code deactivate a session, the following sequence of API are used: lttng_stop_tracing_no_wait() lttng_data_pending() If return value > 0, repeat every 100ms until return value == 0 (ie: no more data) From our log, the session has about 6MB of data. The deactivation has taken more than 5hrs and lttng_data_pending still returns 1. "lttng list" shows that the session is already inactive. Periodically check the size of the session dir, no further data being written into that dir for a long time. "kill -SIGABRT" is used to kill the consumerD. The corresponding gdb printout is attached with this report. Unfortunately, we did not manage to use "kill -SIGABRT" on sessionD as it was automatically killed by our health check process once it detected that consumerD was no longer healthy. This is the first time we observe this behaviour in lttng2.2.0rc2 . We have seen this a couple of time in lttng2.1 . Note in lttng2.1, we used to see the deactivation process last more than 2 days (never got completed). Is problem reproducible ? ========================= * Maybe How to reproduce (if reproducible): =================================== * Our stability test consist of 4 users. Each user has a different set of trace commands (such as create session, activate session, stop session, etc). Each user then executes its set of commands through multiple iterations. All sessions are created using streamming and perUID buffer and tracing on userspace only. After the overnight run, we start seing one session from one node could not complete the deactivation process. Any other information: ====================== -
Files
Actions