Project

General

Profile

Actions

Bug #900

closed

Double free or corruption crash on relay daemon

Added by anusha mahamkali over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Target version:
Start date:
08/03/2015
Due date:
% Done:

0%

Estimated time:

Description

A crash has been observed on relay daemon, when 24 streaming sessions with same name were created from 24 different targets towards a relay daemon running on remote host.
LTTng-tools version being used on the target is 2.6.

Below traces are observed in the logs:
PERROR - 09:24:33.402270 [31294/31310]: Relay index to close fd 32575: Bad file descriptor (in deferred_free_relay_index() at index.c:41)
  • Error in `./lttng-relayd': double free or corruption (out): 0x00007f3fdc04d930 ***

Are there any known issues similar to above in relay daemon LTTng-tools version 2.6?
If yes, kindly let us know.


Files

lttng-relayd-crash.txt (195 KB) lttng-relayd-crash.txt anusha mahamkali, 08/04/2015 03:44 AM
Actions #1

Updated by Julien Desfossez over 8 years ago

  • Project changed from LTTngTop to LTTng-tools
Actions #2

Updated by Jérémie Galarneau over 8 years ago

How long does it take for the corruption to occur?

Actions #3

Updated by anusha mahamkali over 8 years ago

The corruption is observed immediately after creating 24 (mostly 21) sessions from 24 different trace generating systems
towards a single relayd on remote host.

We reproduced the problem in verbose mode and collected the logs. Please find them attached to the case.

Below traces are of some interest:
PERROR - 09:24:33.380611 [31294/31297]: pipe: Too many open files (in run_as_clone() at runas.c:204)
PERROR - 09:24:33.380624 [31294/31297]: Index trace directory creation error: Too many open files (in index_create_file() at index.c:56)
.
.
.
PERROR - 09:24:33.393093 [31294/31297]: close stream: Bad file descriptor (in stream_close() at stream.c:91)
.
.
.
PERROR - 09:24:33.402270 [31294/31310]: Relay index to close fd 32575: Bad file descriptor (in deferred_free_relay_index() at index.c:41)
  • Error in `./lttng-relayd': double free or corruption (out): 0x00007f3fdc04d930 ***
Actions #4

Updated by Jérémie Galarneau over 8 years ago

  • Status changed from New to Feedback
  • Priority changed from Critical to Normal

This does look like the number of file descriptors has been exhausted. Can you list the open file descriptors just before the process crashes?

$ ls -l /proc/your_relayd_pid/fd

As a workaround, you could try increasing the number of file descriptors available per-process.
Can you provide the output of

$ ulimit -n

You can try increasing this value to, say, 4096 and see if the problem persists.

Actions #5

Updated by anusha mahamkali over 8 years ago

We tried to increase the number of file descriptors per process with ulimit command
and it worked for us. Problem is not seen anymore.

Actions #6

Updated by Jérémie Galarneau over 8 years ago

  • Status changed from Feedback to Confirmed

Good to hear. I'll keep the bug open to make sure this type of error is handled gracefully.

Actions #7

Updated by Jérémie Galarneau over 8 years ago

  • Status changed from Confirmed to In Progress
  • Assignee set to Mathieu Desnoyers
  • Target version changed from 2.6 to 2.7
Actions #8

Updated by Jérémie Galarneau over 8 years ago

  • Target version changed from 2.7 to 2.6
Actions #9

Updated by Jérémie Galarneau over 8 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF