Bug #549
closedlttng2.2.0rc2: Session can not complete deactivating process due to return value 1 of lttng_data_pending()
0%
Description
Commit used: ============ babeltrace : 9eaf254 Version 1.0.3 tools : 094d169 (HEAD, origin/master, origin/HEAD) Fix: dereference after NULL check ust : 996aead (HEAD, origin/master, origin/HEAD) Add parameter -f to rm in Makefile clean target userspace : 264716f (HEAD, origin/stable-0.7, stable-0.7) Fix: Use a filled signal mask to disable all signals Problem Description: ==================== * During stability test, sometimes the deactivating process got hung when consumerD keeps reporting lttng_data_pending() with value 1 even if the session is already inactive (seen via "lttng list"). When our code deactivate a session, the following sequence of API are used: lttng_stop_tracing_no_wait() lttng_data_pending() If return value > 0, repeat every 100ms until return value == 0 (ie: no more data) From our log, the session has about 6MB of data. The deactivation has taken more than 5hrs and lttng_data_pending still returns 1. "lttng list" shows that the session is already inactive. Periodically check the size of the session dir, no further data being written into that dir for a long time. "kill -SIGABRT" is used to kill the consumerD. The corresponding gdb printout is attached with this report. Unfortunately, we did not manage to use "kill -SIGABRT" on sessionD as it was automatically killed by our health check process once it detected that consumerD was no longer healthy. This is the first time we observe this behaviour in lttng2.2.0rc2 . We have seen this a couple of time in lttng2.1 . Note in lttng2.1, we used to see the deactivation process last more than 2 days (never got completed). Is problem reproducible ? ========================= * Maybe How to reproduce (if reproducible): =================================== * Our stability test consist of 4 users. Each user has a different set of trace commands (such as create session, activate session, stop session, etc). Each user then executes its set of commands through multiple iterations. All sessions are created using streamming and perUID buffer and tracing on userspace only. After the overnight run, we start seing one session from one node could not complete the deactivation process. Any other information: ====================== -
Files
Updated by David Goulet over 11 years ago
- Status changed from New to Confirmed
- Target version set to 2.2
Updated by Tan le tran over 11 years ago
After loading lttng-tools 2.2.3 and lttng-ust 2.2.1, so far, this bug has not been reproduced. However, since the scenario that is needed to reproduce this fault is still unknown; it does not neccessary mean that it has been fixed.
Therefore, before closing this bug report, it would be great if some information can be stated here regarding how data should be collected once the fault occurs again, so that we don't lose our chance to capture the info to find the root cause.
Many Thanks.
Updated by David Goulet over 11 years ago
- Status changed from Confirmed to Resolved
Killing with SIGABRT is good as long as the ulimit -c is set to unlimited for coredump.
Else, attach with gdb and start printing all the back traces from all the available threads.
gdb> info threads
gdb> 1
gdb> bt full
[repeat process for all threads]