Bug #857
closedbabeltrace will exit (often with a core dump) if an index file is too short
Added by craig harmer about 10 years ago. Updated almost 5 years ago.
0%
Description
If a ".idx" file is truncated for some reason then babeltrace will fail to open the trace session and, in most cases core dump. Output looks like:
charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 read index file header: Success Segmentation fault (core dumped)
Files
babeltrace-ctf.diff (732 Bytes) babeltrace-ctf.diff | Patch for the bug. | craig harmer, 11/10/2014 09:52 PM | |
fread-check-error.patch (2.54 KB) fread-check-error.patch | Patch for all cases in ctf.c where fread() is called without complete error checking. | craig harmer, 11/13/2014 08:56 PM |
Updated by craig harmer about 10 years ago
Here's a way to reproduce the problem:
charmer@ch3[hds-src] lttng create Session auto-20141110-184157 created. Traces will be written in /home/charmer/lttng-traces/auto-20141110-184157 charmer@ch3[hds-src] lttng enable-event -k -a --syscall All kernel system calls are enabled in channel channel0 charmer@ch3[hds-src] lttng start Tracing started for session auto-20141110-184157 charmer@ch3[hds-src] lttng stop Waiting for data availability. Tracing stopped for session auto-20141110-184157 charmer@ch3[hds-src] lttng destroy Session auto-20141110-184157 destroyed charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 | head -5 [18:42:57.026808475] (+?.?????????) ch3 exit_syscall: { cpu_id = 3 }, { ret = 1 } [18:42:57.026828133] (+0.000019658) ch3 exit_syscall: { cpu_id = 1 }, { ret = 4 } [18:42:57.026838040] (+0.000009907) ch3 sys_poll: { cpu_id = 1 }, { ufds = 0x7F55199A8B50, nfds = 2, timeout_msecs = -1 } [18:42:57.046060759] (+0.019222719) ch3 exit_syscall: { cpu_id = 0 }, { ret = 0 } [18:42:57.046076893] (+0.000016134) ch3 sys_times: { cpu_id = 0 }, { tbuf = 0x7FFFB340A990 } charmer@ch3[hds-src] find /home/charmer/lttng-traces/auto-20141110-184157 -name '*.idx' /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_5.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_0.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_3.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_2.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_7.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_1.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_6.idx /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_4.idx charmer@ch3[hds-src] > /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_3.idx charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 | head -5 read index file header: Success charmer@ch3[hds-src] > /home/charmer/lttng-traces/auto-20141110-184157/kernel/index/channel0_7.idx charmer@ch3[hds-src] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 read index file header: Success Segmentation fault (core dumped) charmer@ch3[hds-src] ls -l /var/cores total 1066308 -rw------- 1 charmer charmer 3162112 Nov 10 18:43 babeltrace.25521.1415673838 -rw------- 1 charmer charmer 3162112 Nov 10 18:44 babeltrace.25526.1415673882
Updated by craig harmer about 10 years ago
- File babeltrace-ctf.diff babeltrace-ctf.diff added
With this patch applied and the package rebuilt, the output looks like this:
charmer@ch3[built-pkgdir] babeltrace /home/charmer/lttng-traces/auto-20141110-184157 | head -5 [error] An index file under /home/charmer/lttng-traces/auto-20141110-184157/kernel: Index file is too short [error] An index file under /home/charmer/lttng-traces/auto-20141110-184157/kernel: Index file is too short [18:42:57.026828133] (+?.?????????) ch3 exit_syscall: { cpu_id = 1 }, { ret = 4 } [18:42:57.026838040] (+0.000009907) ch3 sys_poll: { cpu_id = 1 }, { ufds = 0x7F55199A8B50, nfds = 2, timeout_msecs = -1 } [18:42:57.046060759] (+0.019222719) ch3 exit_syscall: { cpu_id = 0 }, { ret = 0 } [18:42:57.046076893] (+0.000016134) ch3 sys_times: { cpu_id = 0 }, { tbuf = 0x7FFFB340A990 } [18:42:57.046079449] (+0.000002556) ch3 exit_syscall: { cpu_id = 0 }, { ret = 1832798311 } <pre>
Updated by craig harmer about 10 years ago
If you are using the python bindings to open the trace files than a core dump also occurs (and is fairly inexplicable).
Updated by craig harmer about 10 years ago
- File fread-check-error.patch fread-check-error.patch added
I realized that there are probably other places in the code where fread() is called without complete error checking. I also realized that, according to the man page, fread() does not set errno so the use of strerror() (and previously perror()) is incorrect.
This patch checks the return value from fread() every where it is called and prints a message if an error occurred, or returns success if we hit EOF (and it is reasonable to hit EOF). Specific changes:
packet_metadata() -- print an error message and return -1 if unable to read the packet header (previously it silently returned success even if the header was not read).
ctf_trace_metadata_packet_read() -- print a message if an error occurred reading the header and return EINVAL. if we hit EOF return success without a message (previously returned EINVAL without a message for both end-of-file and errors().
ctf_trace_metadata_packet_read() -- in the "big loop", print an error message and return EINVAL if we got an error reading the file or if we hit unexpected EOF. previously no message was printed. previously, if we hit EOF the loop would have continued forever. i also had to rearrange the code a bit to handle the case where "toread" was 0 when the loop was entered.
import_stream_packet_index() -- fix the originally reported problem. Messages differentiate between an i/o error and a file that's too short, but no longer expects errno to be set.
import_stream_packet_index() -- add a check to see if an i/o error caused the loop to terminate; print a message and return an error if so.
Updated by craig harmer about 10 years ago
I should probably have mentioned: index files, or other files, can be too short if you run out of disk space while writing them or if the system reboots while tracing is active without properly shutting down. The second is what happened to me (and is fairly common for the project i'm working on).
Updated by Jonathan Rajotte Julien almost 5 years ago
- Status changed from New to Invalid
State of babeltrace moved a lot since.
Closing this ticket as invalid. Reopen it if it stills apply to Babeltrace 2.