Project

General

Profile

Actions

Bug #530

closed

lttng-tools2.2.0rc2: ConsumerD segfault in ustctl_flush_buffer (stream=0x63c4e0, producer_active=0) at ustctl.c:1425

Added by Tan le tran over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
05/13/2013
Due date:
% Done:

100%

Estimated time:

Description

Commit used:
============
userspace: 56e676d (HEAD, origin/stable-0.7) Document: rculfhash destroy and resize side-effect in 0.7
ust      : 83e4321 (HEAD, origin/master, origin/HEAD) Fix: incorrect support for multi-context
tools    : 1274479 (HEAD, origin/master, origin/HEAD) Fix: check channel subbuf size against page size
bebltrace: 5bfcad9 (HEAD, origin/master, origin/HEAD) Fix: handling of empty streams

Problem Description:
====================
 * While our stability-test is running, consumerD segfault once in a while with the following
   gdb info:
      (gdb) bt
      #0  0x00007f79958c72fc in ustctl_flush_buffer (stream=0x63c4e0, producer_active=0) at ustctl.c:1425
      #1  0x000000000041c57b in lttng_ustconsumer_on_stream_hangup (stream=0x647590) at ust-consumer.c:1196
      #2  0x000000000040a5e9 in consumer_thread_data_poll (data=0x633d10) at consumer.c:2500
      #3  0x00007f799549f7b6 in start_thread () from /lib64/libpthread.so.0
      #4  0x00007f79951fac5d in clone () from /lib64/libc.so.6
      #5  0x0000000000000000 in ?? ()

  * This segfault happens very often (about each 10 minutes ).

Is problem reproducible ?
=========================
  * yes 

How to reproduce (if reproducible):
===================================
  * We got 4 users, each has a different set of trace actions to perform (ex: create session, enable-chanel, 
    enable-event, start session, stop session, etc). Once the set of action has been done, that user sleep 
    for a few second and repeat the same set of actions again.

    While running the above, we encounter this consumerD segfault issue.

Any other information:
======================
- Included in this bug report is the gdb printout + the "sessiond -vvv --verbose-consumer" printout.
  The time at which the segfault occured, was about 14:10 .


Files

gdb_printout.log (12.7 KB) gdb_printout.log gdb printout Tan le tran, 05/13/2013 02:44 PM
sessiond_verbose.tar (57.9 KB) sessiond_verbose.tar "sessiond -vvv --verbose-consumer" output Tan le tran, 05/13/2013 02:44 PM
bug530.patch (2.34 KB) bug530.patch David Goulet, 05/15/2013 01:17 PM
May17_Pacth1_applied_gdb_printout.log (29.7 KB) May17_Pacth1_applied_gdb_printout.log gdb printout after apllying Patch (from update#2) May17. Tan le tran, 05/17/2013 09:06 AM
Actions #1

Updated by David Goulet over 11 years ago

  • Status changed from New to Confirmed
  • Assignee set to David Goulet
  • Target version set to 2.2
Actions #2

Updated by David Goulet over 11 years ago

This patch should fix the issue. I'll wait for your ACK before merging it. There is a clear race that the patch fixes.

Actions #3

Updated by Tan le tran over 11 years ago

Hi David,

The old segfault is no longer there, but a new one is observed.
It now occurs in multiple nodes and the frequency of this occurences is
also about every 3-7min . We still run the same test suite described on the
top of this bug report.

New commits used: =================
lttng-tools: c5854b1 (HEAD, origin/master, origin/HEAD) Fix: use memset instead of poll ...
+ Apply bug530 patch (from update #2)

lttng-ust : 352fce3 (HEAD, origin/master, origin/HEAD) Remove 0.x TODO
rcu : 56e676d (HEAD, origin/stable-0.7) Document: rculfhash destroy and resize...
babeltrace : 5bfcad9 (HEAD, origin/master, origin/HEAD) Fix: handling of empty streams

New gdb_printout is attached.
I have quickly checked gdb for other coredumps and they all have very similar back trace.

Please, let us know if further info are needed.

Actions #4

Updated by David Goulet over 11 years ago

Thanks Tan! I'll be merging this fix and I've opened a new bug with this new issue (#536).

This bug will be closed once the commit is done.

Actions #5

Updated by David Goulet over 11 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF