Bug #424
closedlttng-consumerd crash after destroying a tracing session
Added by Mathieu Bain almost 12 years ago. Updated over 11 years ago.
0%
Description
Running 3 tracing sessions and stop them at the same time (using the API call lttng_stop_tracing_no_wait).
While the sessions are stopping, issuing the command destroy to of them by using the API call lttng_destroy_session.
Then lttng-consumerd crash (about 1 second after the last command was issued according to the logs).
version of lttng-tools used is 2.1.0
version of lttng-ust used is 2.1.0
version of userspace-rcu used is 0.7.5
The logs show the commands issued when the problem happened.
Files
log_2013_01_17_13_03.txt (167 KB) log_2013_01_17_13_03.txt | logs captured when consumerd crashed | Mathieu Bain, 01/18/2013 10:20 AM | |
gdb_logs.log (3.93 KB) gdb_logs.log | Mathieu Bain, 01/22/2013 03:58 PM |
Updated by David Goulet almost 12 years ago
I would strongly recommend you to test again with 2.1.1 because I can
think of at least two consumerd critical fixes that went in.
Thanks!
Issue #424 has been reported by Mathieu Bain.
----------------------------------------
Bug #424: lttng-consumerd crash after destroying a tracing session
https://bugs.lttng.org/issues/424Author: Mathieu Bain
Status: New
Priority: Normal
Assignee:
Category:
Target version:Running 3 tracing sessions and stop them at the same time (using the API call lttng_stop_tracing_no_wait).
While the sessions are stopping, issuing the command destroy to of them by using the API call lttng_destroy_session.Then lttng-consumerd crash (about 1 second after the last command was issued according to the logs).
version of lttng-tools used is 2.1.0
version of lttng-ust used is 2.1.0
version of userspace-rcu used is 0.7.5The logs show the commands issued when the problem happened.
Updated by Mathieu Bain almost 12 years ago
small correction, the bug was seen on lttng 2.1.0 but for both versions for lttng tools (ie 2.1.0 and 2.1.1)
Updated by David Goulet almost 12 years ago
TBH, these logs are from your system. I can't really debug the problem without sessiond and consumerd logs :S...
Updated by Mathieu Bain almost 12 years ago
I understand, the problem with this bug is that I saw it only 3 times and it is hard to reproduce. So if I catch it again, I will try to provide you more logs about sessionD and consumerD. But right now I don't have better logs... :S I will look a bit more in the core file, try to see if i can get more info for you.
Updated by David Goulet almost 12 years ago
Right, it's a "crash" meaning the consumer quits and does not segfaults or abort.
Thanks
Updated by Mathieu Bain almost 12 years ago
By looking into the core, it seems that consumerD was aborted ("Program terminated with signal 6, Aborted")
Updated by David Goulet almost 12 years ago
This means you have a coredump!?
If so, please add the full backtrace and relevant information that can help us.
Thanks!
Updated by Mathieu Bain almost 12 years ago
- File gdb_logs.log gdb_logs.log added
I finally was able to get the backtrace from the core file.
So it is attached to this answer.
I hope it will help you.
Updated by David Goulet almost 12 years ago
Hmmm it's a bit difficult with optimization on where I can't see the values of certain variables
Now, this is almost impossible since it seems you've hit the assert in consumer_add_metadata_stream() for the stream NULL but the one and only call done to that function, the NULL stream is checked JUST before. Furthermore, it can't be the hash table that is NULL since it's only destroyed if the thread quits......
Looking at the pollfd value, the value is not right, way to big for a file descriptor "2085600352". We can see that the stream is NULL also but how consumer_add_metadata_stream() was called is a mystery since the only way is that pollfd is to the consumer metadata pipe...
If anyone has an idea or sees something I don't please come forward! :)
Updated by Mathieu Desnoyers almost 12 years ago
Please try to reproduce the crash with -O0 so we can make some sense of
the backtrace.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Updated by Mathieu Bain almost 12 years ago
I will give a try, and hope that the bug will happen again. I will send the data if so.
Thanks
Updated by David Goulet almost 12 years ago
- Status changed from New to Feedback
- Target version set to 2.1 stable
Updated by Tan le tran over 11 years ago
Hi David,
After loading the following from the lttng git repo:
lttng-tools ed22248 (HEAD, temp_bug429_bug433_patches) Apply patches for bug429 and 433
b325dc7 (origin/stable-2.1) Fix: put session list lock around the app registration
lttng-ust 164931d (HEAD, origin/stable-2.1) Fix: refcount issue in lttng-ust-abi.c
userspace-rcu da9bed2 (HEAD, tag: v0.7.6) Version 0.7.6
The above problem is no longer seen after many runs from our side.
Looks like the other fixes might have a positive side impact on this one.
Thank You for your support.
/Tan
Updated by David Goulet over 11 years ago
- Status changed from Feedback to Resolved