Bug #549
closedlttng2.2.0rc2: Session can not complete deactivating process due to return value 1 of lttng_data_pending()
0%
Description
Commit used:
============
babeltrace : 9eaf254 Version 1.0.3
tools : 094d169 (HEAD, origin/master, origin/HEAD) Fix: dereference after NULL check
ust : 996aead (HEAD, origin/master, origin/HEAD) Add parameter -f to rm in Makefile clean target
userspace : 264716f (HEAD, origin/stable-0.7, stable-0.7) Fix: Use a filled signal mask to disable all signals
Problem Description:
====================
* During stability test, sometimes the deactivating process got hung when consumerD
keeps reporting lttng_data_pending() with value 1 even if the session is already
inactive (seen via "lttng list").
When our code deactivate a session, the following sequence of API are used:
lttng_stop_tracing_no_wait()
lttng_data_pending()
If return value > 0, repeat every 100ms until return value == 0 (ie: no more data)
From our log, the session has about 6MB of data. The deactivation has taken more than
5hrs and lttng_data_pending still returns 1.
"lttng list" shows that the session is already inactive.
Periodically check the size of the session dir, no further data being written into
that dir for a long time.
"kill -SIGABRT" is used to kill the consumerD. The corresponding gdb printout is
attached with this report. Unfortunately, we did not manage to use "kill -SIGABRT"
on sessionD as it was automatically killed by our health check process once it
detected that consumerD was no longer healthy.
This is the first time we observe this behaviour in lttng2.2.0rc2 .
We have seen this a couple of time in lttng2.1 . Note in lttng2.1, we used to
see the deactivation process last more than 2 days (never got completed).
Is problem reproducible ?
=========================
* Maybe
How to reproduce (if reproducible):
===================================
* Our stability test consist of 4 users. Each user has a different set of trace commands
(such as create session, activate session, stop session, etc). Each user then executes
its set of commands through multiple iterations.
All sessions are created using streamming and perUID buffer and tracing on userspace only.
After the overnight run, we start seing one session from one node could not complete the
deactivation process.
Any other information:
======================
-
Files
DG Updated by David Goulet over 12 years ago
- Status changed from New to Confirmed
- Target version set to 2.2
TL Updated by Tan le tran over 12 years ago
After loading lttng-tools 2.2.3 and lttng-ust 2.2.1, so far, this bug has not been reproduced. However, since the scenario that is needed to reproduce this fault is still unknown; it does not neccessary mean that it has been fixed.
Therefore, before closing this bug report, it would be great if some information can be stated here regarding how data should be collected once the fault occurs again, so that we don't lose our chance to capture the info to find the root cause.
Many Thanks.
DG Updated by David Goulet over 12 years ago
- Status changed from Confirmed to Resolved
Killing with SIGABRT is good as long as the ulimit -c is set to unlimited for coredump.
Else, attach with gdb and start printing all the back traces from all the available threads.
gdb> info threads
gdb> 1
gdb> bt full
[repeat process for all threads]