Bug #900
closed
Double free or corruption crash on relay daemon
Added by anusha mahamkali over 9 years ago.
Updated about 9 years ago.
Description
A crash has been observed on relay daemon, when 24 streaming sessions with same name were created from 24 different targets towards a relay daemon running on remote host.
LTTng-tools version being used on the target is 2.6.
Below traces are observed in the logs:
PERROR - 09:24:33.402270 [31294/31310]: Relay index to close fd 32575: Bad file descriptor (in deferred_free_relay_index() at index.c:41)
- Error in `./lttng-relayd': double free or corruption (out): 0x00007f3fdc04d930 ***
Are there any known issues similar to above in relay daemon LTTng-tools version 2.6?
If yes, kindly let us know.
Files
- Project changed from LTTngTop to LTTng-tools
How long does it take for the corruption to occur?
The corruption is observed immediately after creating 24 (mostly 21) sessions from 24 different trace generating systems
towards a single relayd on remote host.
We reproduced the problem in verbose mode and collected the logs. Please find them attached to the case.
Below traces are of some interest:
PERROR - 09:24:33.380611 [31294/31297]: pipe: Too many open files (in run_as_clone() at runas.c:204)
PERROR - 09:24:33.380624 [31294/31297]: Index trace directory creation error: Too many open files (in index_create_file() at index.c:56)
.
.
.
PERROR - 09:24:33.393093 [31294/31297]: close stream: Bad file descriptor (in stream_close() at stream.c:91)
.
.
.
PERROR - 09:24:33.402270 [31294/31310]: Relay index to close fd 32575: Bad file descriptor (in deferred_free_relay_index() at index.c:41)
- Error in `./lttng-relayd': double free or corruption (out): 0x00007f3fdc04d930 ***
- Status changed from New to Feedback
- Priority changed from Critical to Normal
This does look like the number of file descriptors has been exhausted. Can you list the open file descriptors just before the process crashes?
$ ls -l /proc/your_relayd_pid/fd
As a workaround, you could try increasing the number of file descriptors available per-process.
Can you provide the output of
$ ulimit -n
You can try increasing this value to, say, 4096 and see if the problem persists.
We tried to increase the number of file descriptors per process with ulimit command
and it worked for us. Problem is not seen anymore.
- Status changed from Feedback to Confirmed
Good to hear. I'll keep the bug open to make sure this type of error is handled gracefully.
- Status changed from Confirmed to In Progress
- Assignee set to Mathieu Desnoyers
- Target version changed from 2.6 to 2.7
- Target version changed from 2.7 to 2.6
- Status changed from In Progress to Resolved
Also available in: Atom
PDF