Project

General

Profile

Actions

Bug #790

closed

babeltrace crash while reading 3 traces

Added by Francis Giraldeau almost 10 years ago. Updated about 4 years ago.

Status:
Invalid
Priority:
Normal
Category:
-
Target version:
Start date:
04/28/2014
Due date:
% Done:

0%

Estimated time:

Description

Reading a trace directory containing three traces makes babeltrace crash. Reading individual traces does not trigger the problem. The problem is reproducible with current master branch (up to commit 62f11f7472d3c). The trace is is added to the bug report. The GDB crash report is following.

[francis@berta babeltrace]$ libtool --mode=execute gdb ./converter/babeltrace 
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/francis/gitsrc/babeltrace/converter/.libs/lt-babeltrace...done.
(gdb) run /home/francis/lttng-traces/django-index > /dev/null
Starting program: /home/francis/gitsrc/babeltrace/converter/.libs/lt-babeltrace /home/francis/lttng-traces/django-index > /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[error] Only CTF traces with a single clock description are supported by this babeltrace version.
[error] No clocks can be correlated and multiple traces are added to the collection. If you are certain those traces can be correlated, try using "--clock-force-correlate".
[warning] [Context] cannot open trace "/home/francis/lttng-traces/django-index/django-db/peer-django-db.phd.local-20140428-172732/kernel" from /home/francis/lttng-traces/django-index for reading.
[warning] errors occurred when opening trace "/home/francis/lttng-traces/django-index" for reading, continuing anyway.

*** Error in `/home/francis/gitsrc/babeltrace/converter/.libs/lt-babeltrace': free(): invalid pointer: 0x0000000000932760 ***

Program received signal SIGABRT, Aborted.
0x00007ffff64baf79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff64baf79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff64be388 in __GI_abort () at abort.c:89
#2  0x00007ffff64f81d4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6606a10 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff65044ae in malloc_printerr (ptr=<optimized out>, str=0x7ffff6602b03 "free(): invalid pointer", action=1) at malloc.c:4996
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3840
#5  0x00007ffff6a86e3b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6  0x00007ffff79b613c in ctf_destroy_metadata (trace=trace@entry=0x947040) at ctf-visitor-generate-io-struct.c:3131
#7  0x00007ffff799e70d in ctf_close_trace (tdp=0x947040) at ctf.c:2517
#8  0x00007ffff7bd13e7 in remove_trace_handle (handle=0xfa0e50) at context.c:235
#9  0x00007ffff6a9fc67 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ffff6aa09a1 in g_hash_table_remove_all () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ffff6aa0a0e in g_hash_table_destroy () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007ffff7bd17d1 in bt_context_destroy (ctx=0x60bab0) at context.c:201
#13 bt_context_put (ctx=0x60bab0) at context.c:222
#14 0x0000000000402946 in main (argc=<optimized out>, argv=<optimized out>) at babeltrace.c:792

Files

Actions #1

Updated by Jérémie Galarneau almost 10 years ago

  • Status changed from New to Confirmed
  • Assignee set to Jérémie Galarneau
  • Target version set to Babeltrace 1.2

Reproduced on my end.

Actions #2

Updated by Mathieu Desnoyers almost 10 years ago

I can reproduce after removing all files but the metadata from the trace:

.
|-- django-client
|   |-- peer-django-client.phd.local-20140428-172732
|       |-- kernel
|           |-- metadata
|-- django-db
|   |-- peer-django-db.phd.local-20140428-172732
|       |-- kernel
|           |-- metadata
|-- django-httpd
    |-- peer-django-httpd.phd.local-20140428-172723
        |-- kernel
            |-- metadata
Actions #3

Updated by Mathieu Desnoyers almost 10 years ago

If I keep only metadata, and only 2 traces:

.
|-- django-db
|   `-- peer-django-db.phd.local-20140428-172732
|       `-- kernel
|           `-- metadata
`-- django-httpd
    `-- peer-django-httpd.phd.local-20140428-172723
        `-- kernel
            `-- metadata

we get:

[error] Only CTF traces with a single clock description are supported by this babeltrace version.

Actions #4

Updated by Mathieu Desnoyers almost 10 years ago

The issue appears to be in trace-collection.c clock_add(). This comment is inaccurate:

/*
 * For now we only support CTF that has one
 * single clock uuid or name (absolute ref) per
 * trace.
*/

We're not checking at the "trace" level, but rather at the "trace collection" level.

Also, we have an error here that is not checked anywhere.

We should also test this with the force clock correlate cmd line option.

Actions #5

Updated by craig harmer over 9 years ago

i hit this same bug while adding 3 trace files to a collection via the python bindings, although the 3 files were from 2 hosts instead of 3.

strangely, feeding the same 3 trace files to babeltrace did not cause it to crash. instead it generated more error messages.

looking at the code, it appears that the problem could be avoided if there were a way to set "--clock-force-correlate" with the python bindings (that is, set the global variable opt_clock_force_correlate with the python bindings), but there's not.

here's my reproduction and the stack traceback, courtesy of gdb:

charmer@dev-tracing[2014-11-06-fio-10g-prepared] trace-view 172.20.137.69/var/log/sop/traced/lttng-traces/escale-session-20141106-234448 172.20.137.69/var/log/sop/traced/lttng-traces/escale-session-20141107-004449 172.20.137.70/var/log/sop/traced/lttng-traces/escale-session-20141106-194514 

[error] No clocks can be correlated and multiple traces are added to the collection. If you are certain those traces can be correlated, try using "--clock-force-correlate".
error adding trace directory: 172.20.137.70/var/log/sop/traced/lttng-traces/escale-session-20141106-194514/ust/uid/0/64-bit
*** glibc detected *** python2.7: free(): invalid pointer: 0x000000000e2419a0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7f5bbd178d76]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f5bbd17db1c]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x20b7b)[0x7f5bbbed5b7b]
/usr/lib/x86_64-linux-gnu/libbabeltrace-ctf.so.1(+0x23d89)[0x7f5bbc9efd89]
/usr/lib/x86_64-linux-gnu/libbabeltrace-ctf.so.1(+0xc421)[0x7f5bbc9d8421]
/usr/lib/x86_64-linux-gnu/libbabeltrace.so.1(+0x48c7)[0x7f5bbbcac8c7]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x39c23)[0x7f5bbbeeec23]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_hash_table_remove_all+0x21)[0x7f5bbbeefaa1]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_hash_table_destroy+0xe)[0x7f5bbbeefb0e]
/usr/lib/x86_64-linux-gnu/libbabeltrace.so.1(bt_context_put+0x21)[0x7f5bbbcacce1]
/usr/lib/python2.7/dist-packages/_babeltrace.so(+0xf665)[0x7f5bbcc17665]
python2.7(PyEval_EvalFrameEx+0x2ce)[0x4aa4de]
python2.7(PyEval_EvalCodeEx+0x198)[0x4b1ef8]
...

charmer@dev-tracing[2014-11-06-fio-10g-prepared] file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python2.7 /home/charmer/bin/trace-view 172.20.137.69/var/log/sop/traced/lttng-t'

(gdb) bt
#0  0x00007f5bbd134475 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f5bbd1376f0 in *__GI_abort () at abort.c:92
#2  0x00007f5bbd16f52b in __libc_message (do_abort=<optimized out>, fmt=<optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007f5bbd178d76 in malloc_printerr (action=3, str=0x7f5bbd24f1fc "free(): invalid pointer", ptr=<optimized out>) at malloc.c:6312
#4  0x00007f5bbd17db1c in *__GI___libc_free (mem=<optimized out>) at malloc.c:3738
#5  0x00007f5bbbed5b7b in ptr_array_free (farray=0xe228ec0, flags=3) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/garray.c:1064
#6  0x00007f5bbbed5bad in g_ptr_array_free (farray=<optimized out>, free_segment=free_segment@entry=1) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/garray.c:1050
#7  0x00007f5bbc9efd89 in ctf_destroy_metadata (trace=trace@entry=0xe2483c0) at ctf-visitor-generate-io-struct.c:3131
#8  0x00007f5bbc9d8421 in ctf_close_trace (tdp=0xe2483c0) at ctf.c:2522
#9  ctf_close_trace (tdp=0xe2483c0) at ctf.c:2497
#10 0x00007f5bbbcac8c7 in remove_trace_handle (handle=0xf8730b0) at context.c:235
#11 0x00007f5bbbeeec23 in g_hash_table_remove_all_nodes (hash_table=0xed7920, notify=<optimized out>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/ghash.c:536
#12 0x00007f5bbbeefaa1 in g_hash_table_remove_all (hash_table=hash_table@entry=0xed7920) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/ghash.c:1345
#13 0x00007f5bbbeefb0e in g_hash_table_destroy (hash_table=0xed7920) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/ghash.c:1049
#14 0x00007f5bbbcacce1 in bt_context_destroy (ctx=0xe3e790) at context.c:201
#15 bt_context_put (ctx=0xe3e790) at context.c:222
#16 0x00007f5bbcc17665 in _wrap__bt_context_put (self=<optimized out>, args=<optimized out>) at babeltrace_wrap.c:4061
#17 0x00000000004aa4de in PyEval_EvalFrameEx ()
#18 0x00000000004b1ef8 in PyEval_EvalCodeEx ()
#19 0x00000000004b2a6c in ?? ()
#20 0x00000000004c4894 in ?? ()
#21 0x0000000000471ace in PyEval_CallObjectWithKeywords ()
...

Actions #6

Updated by Li RongQing almost 9 years ago

any progress?

what about the below patch?

diff --git a/lib/context.c b/lib/context.c
index 45aab34..c9033bb 100644
--- a/lib/context.c
+++ b/lib/context.c
@ -159,10 +159,8 @ int bt_context_add_trace(struct bt_context *ctx, const char *path,
return handle->id;

error:
- closeret = fmt->close_trace(td);
- if (closeret) {
- fprintf(stderr, "Error in close_trace callback\n");
- }
+ if (handle)
+ bt_context_remove_trace(ctx, handle->id);
end:
return ret;
}
Actions #7

Updated by Francis Giraldeau almost 9 years ago

I confirm that the bug does still occur in babeltrace master commit 5ee55d051654, and that the patch of Li RongQing prevent the invalid pointer free error. It may not be the right fix, but at least prevent the crash. The unit tests are passing with this patch. The updated patch is only to avoid compilation warning about unused variable.

Actions #8

Updated by Jonathan Rajotte Julien about 4 years ago

  • Status changed from Confirmed to Invalid

State of babeltrace moved a lot since.

Closing this ticket as invalid. Reopen it if it stills apply to Babeltrace 2.

Actions

Also available in: Atom PDF