Project

General

Profile

Actions

Bug #424

closed

lttng-consumerd crash after destroying a tracing session

Added by Mathieu Bain almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
Start date:
01/18/2013
Due date:
% Done:

0%

Estimated time:

Description

Running 3 tracing sessions and stop them at the same time (using the API call lttng_stop_tracing_no_wait).
While the sessions are stopping, issuing the command destroy to of them by using the API call lttng_destroy_session.

Then lttng-consumerd crash (about 1 second after the last command was issued according to the logs).

version of lttng-tools used is 2.1.0
version of lttng-ust used is 2.1.0
version of userspace-rcu used is 0.7.5

The logs show the commands issued when the problem happened.


Files

log_2013_01_17_13_03.txt (167 KB) log_2013_01_17_13_03.txt logs captured when consumerd crashed Mathieu Bain, 01/18/2013 10:20 AM
gdb_logs.log (3.93 KB) gdb_logs.log Mathieu Bain, 01/22/2013 03:58 PM
Actions #1

Updated by David Goulet almost 12 years ago

I would strongly recommend you to test again with 2.1.1 because I can
think of at least two consumerd critical fixes that went in.

Thanks!

:

Issue #424 has been reported by Mathieu Bain.

----------------------------------------
Bug #424: lttng-consumerd crash after destroying a tracing session
https://bugs.lttng.org/issues/424

Author: Mathieu Bain
Status: New
Priority: Normal
Assignee:
Category:
Target version:

Running 3 tracing sessions and stop them at the same time (using the API call lttng_stop_tracing_no_wait).
While the sessions are stopping, issuing the command destroy to of them by using the API call lttng_destroy_session.

Then lttng-consumerd crash (about 1 second after the last command was issued according to the logs).

version of lttng-tools used is 2.1.0
version of lttng-ust used is 2.1.0
version of userspace-rcu used is 0.7.5

The logs show the commands issued when the problem happened.

Actions #2

Updated by Mathieu Bain almost 12 years ago

small correction, the bug was seen on lttng 2.1.0 but for both versions for lttng tools (ie 2.1.0 and 2.1.1)

Actions #3

Updated by David Goulet almost 12 years ago

TBH, these logs are from your system. I can't really debug the problem without sessiond and consumerd logs :S...

Actions #4

Updated by Mathieu Bain almost 12 years ago

I understand, the problem with this bug is that I saw it only 3 times and it is hard to reproduce. So if I catch it again, I will try to provide you more logs about sessionD and consumerD. But right now I don't have better logs... :S I will look a bit more in the core file, try to see if i can get more info for you.

Actions #5

Updated by David Goulet almost 12 years ago

Right, it's a "crash" meaning the consumer quits and does not segfaults or abort.

Thanks

Actions #6

Updated by Mathieu Bain almost 12 years ago

By looking into the core, it seems that consumerD was aborted ("Program terminated with signal 6, Aborted")

Actions #7

Updated by David Goulet almost 12 years ago

This means you have a coredump!?

If so, please add the full backtrace and relevant information that can help us.

Thanks!

Actions #8

Updated by Mathieu Bain almost 12 years ago

I finally was able to get the backtrace from the core file.
So it is attached to this answer.

I hope it will help you.

Actions #9

Updated by David Goulet almost 12 years ago

Hmmm it's a bit difficult with optimization on where I can't see the values of certain variables

Now, this is almost impossible since it seems you've hit the assert in consumer_add_metadata_stream() for the stream NULL but the one and only call done to that function, the NULL stream is checked JUST before. Furthermore, it can't be the hash table that is NULL since it's only destroyed if the thread quits......

Looking at the pollfd value, the value is not right, way to big for a file descriptor "2085600352". We can see that the stream is NULL also but how consumer_add_metadata_stream() was called is a mystery since the only way is that pollfd is to the consumer metadata pipe...

If anyone has an idea or sees something I don't please come forward! :)

Actions #10

Updated by Mathieu Desnoyers almost 12 years ago

Please try to reproduce the crash with -O0 so we can make some sense of
the backtrace.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Actions #11

Updated by Mathieu Bain almost 12 years ago

I will give a try, and hope that the bug will happen again. I will send the data if so.

Thanks

Actions #12

Updated by David Goulet almost 12 years ago

  • Status changed from New to Feedback
  • Target version set to 2.1 stable
Actions #13

Updated by Tan le tran almost 12 years ago

Hi David,

After loading the following from the lttng git repo:
lttng-tools ed22248 (HEAD, temp_bug429_bug433_patches) Apply patches for bug429 and 433
b325dc7 (origin/stable-2.1) Fix: put session list lock around the app registration
lttng-ust 164931d (HEAD, origin/stable-2.1) Fix: refcount issue in lttng-ust-abi.c
userspace-rcu da9bed2 (HEAD, tag: v0.7.6) Version 0.7.6

The above problem is no longer seen after many runs from our side.
Looks like the other fixes might have a positive side impact on this one.

Thank You for your support.
/Tan

Actions #14

Updated by David Goulet over 11 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF