Project

General

Profile

Actions

Bug #857

closed

babeltrace will exit (often with a core dump) if an index file is too short

Added by craig harmer over 9 years ago. Updated about 4 years ago.

Status:
Invalid
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/10/2014
Due date:
% Done:

0%

Estimated time:

Description

If a ".idx" file is truncated for some reason then babeltrace will fail to open the trace session and, in most cases core dump. Output looks like:

charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157

read index file header: Success
Segmentation fault (core dumped)

Files

babeltrace-ctf.diff (732 Bytes) babeltrace-ctf.diff Patch for the bug. craig harmer, 11/10/2014 09:52 PM
fread-check-error.patch (2.54 KB) fread-check-error.patch Patch for all cases in ctf.c where fread() is called without complete error checking. craig harmer, 11/13/2014 08:56 PM
Actions #1

Updated by craig harmer over 9 years ago

Here's a way to reproduce the problem:

charmer@ch3[hds-src] lttng create
Session auto-20141110-184157 created.
Traces will be written in /home/charmer/lttng-traces/auto-20141110-184157

charmer@ch3[hds-src] lttng enable-event -k -a --syscall
All kernel system calls are enabled in channel channel0

charmer@ch3[hds-src] lttng start
Tracing started for session auto-20141110-184157

charmer@ch3[hds-src] lttng stop
Waiting for data availability.

Tracing stopped for session auto-20141110-184157
charmer@ch3[hds-src] lttng destroy
Session auto-20141110-184157 destroyed

charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 | head -5
[18:42:57.026808475] (+?.?????????) ch3 exit_syscall: { cpu_id = 3 }, { ret = 1 }
[18:42:57.026828133] (+0.000019658) ch3 exit_syscall: { cpu_id = 1 }, { ret = 4 }
[18:42:57.026838040] (+0.000009907) ch3 sys_poll: { cpu_id = 1 }, { ufds = 0x7F55199A8B50, nfds = 2, timeout_msecs = -1 }
[18:42:57.046060759] (+0.019222719) ch3 exit_syscall: { cpu_id = 0 }, { ret = 0 }
[18:42:57.046076893] (+0.000016134) ch3 sys_times: { cpu_id = 0 }, { tbuf = 0x7FFFB340A990 }

charmer@ch3[hds-src] find /home/charmer/lttng-traces/auto-20141110-184157 -name '*.idx'
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_5.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_0.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_3.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_2.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_7.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_1.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_6.idx
/home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_4.idx

charmer@ch3[hds-src] > /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_3.idx

charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 | head -5
read index file header: Success

charmer@ch3[hds-src] > /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_7.idx

charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 
read index file header: Success
Segmentation fault (core dumped)

charmer@ch3[hds-src] ls -l /var/cores
total 1066308
-rw------- 1 charmer charmer    3162112 Nov 10 18:43 babeltrace.25521.1415673838
-rw------- 1 charmer charmer    3162112 Nov 10 18:44 babeltrace.25526.1415673882
Actions #2

Updated by craig harmer over 9 years ago

With this patch applied and the package rebuilt, the output looks like this:

charmer@ch3[built-pkgdir] babeltrace /home/charmer/lttng-traces/auto-20141110-184157  | head -5
[error] An index file under /home/charmer/lttng-traces/auto-20141110-184157/kernel: Index file is too short
[error] An index file under /home/charmer/lttng-traces/auto-20141110-184157/kernel: Index file is too short
[18:42:57.026828133] (+?.?????????) ch3 exit_syscall: { cpu_id = 1 }, { ret = 4 }
[18:42:57.026838040] (+0.000009907) ch3 sys_poll: { cpu_id = 1 }, { ufds = 0x7F55199A8B50, nfds = 2, timeout_msecs = -1 }
[18:42:57.046060759] (+0.019222719) ch3 exit_syscall: { cpu_id = 0 }, { ret = 0 }
[18:42:57.046076893] (+0.000016134) ch3 sys_times: { cpu_id = 0 }, { tbuf = 0x7FFFB340A990 }
[18:42:57.046079449] (+0.000002556) ch3 exit_syscall: { cpu_id = 0 }, { ret = 1832798311 }
<pre>

Actions #3

Updated by craig harmer over 9 years ago

If you are using the python bindings to open the trace files than a core dump also occurs (and is fairly inexplicable).

Actions #4

Updated by craig harmer over 9 years ago

I realized that there are probably other places in the code where fread() is called without complete error checking. I also realized that, according to the man page, fread() does not set errno so the use of strerror() (and previously perror()) is incorrect.

This patch checks the return value from fread() every where it is called and prints a message if an error occurred, or returns success if we hit EOF (and it is reasonable to hit EOF). Specific changes:

packet_metadata() -- print an error message and return -1 if unable to read the packet header (previously it silently returned success even if the header was not read).

ctf_trace_metadata_packet_read() -- print a message if an error occurred reading the header and return EINVAL. if we hit EOF return success without a message (previously returned EINVAL without a message for both end-of-file and errors().

ctf_trace_metadata_packet_read() -- in the "big loop", print an error message and return EINVAL if we got an error reading the file or if we hit unexpected EOF. previously no message was printed. previously, if we hit EOF the loop would have continued forever. i also had to rearrange the code a bit to handle the case where "toread" was 0 when the loop was entered.

import_stream_packet_index() -- fix the originally reported problem. Messages differentiate between an i/o error and a file that's too short, but no longer expects errno to be set.

import_stream_packet_index() -- add a check to see if an i/o error caused the loop to terminate; print a message and return an error if so.

Actions #5

Updated by craig harmer over 9 years ago

I should probably have mentioned: index files, or other files, can be too short if you run out of disk space while writing them or if the system reboots while tracing is active without properly shutting down. The second is what happened to me (and is fairly common for the project i'm working on).

Actions #6

Updated by Jonathan Rajotte Julien about 4 years ago

  • Status changed from New to Invalid

State of babeltrace moved a lot since.

Closing this ticket as invalid. Reopen it if it stills apply to Babeltrace 2.

Actions

Also available in: Atom PDF