Bug #734
closedLTTng-UST produces unreadable traces when string input from application is concurrently changed
100%
Description
Commit:
babeltrace 87353f23d5107c8cc09719d46cc8ccee5da989f7 (v1.1.1)
Problem:
Command used when crash happend, directory points to a folder containing a snapshot.
babeltrace --clock-date -n none /trace/telog
crash occurrence is not regular, happens rarely. Following is printed by shell at moment of the crash.
babeltrace: ../../include/babeltrace/ctf/types.h:224: ctf_pos_get_event: Assertion `pos->offset <= pos->content_size' failed.
Backtrace:
(gdb) bt full #0 0x0fbadaa0 in raise () from /lib/libc.so.6 No symbol table info available. #1 0x0fbb2930 in abort () from /lib/libc.so.6 No symbol table info available. #2 0x0fba599c in ?? () from /lib/libc.so.6 No symbol table info available. #3 0x0fba5a54 in __assert_fail () from /lib/libc.so.6 No symbol table info available. #4 0x0ff8b274 in ?? () from /usr/lib/libbabeltrace-ctf.so.0 No symbol table info available. #5 0x0ffd4edc in ?? () from /usr/lib/libbabeltrace.so.0 No symbol table info available. #6 0x0ffd6110 in bt_iter_next () from /usr/lib/libbabeltrace.so.0 No symbol table info available. #7 0x10001ff0 in convert_trace (ctx=0x10015f48, td_write=0x100a8b24) at babeltrace.c:633 iter = 0x100a7550 ret = <optimized out> sout = 0x100a8b10 begin_pos = {type = BT_SEEK_BEGIN, u = {seek_time = 1141231514160201729, restore = 0xfd67804}} ctf_event = <optimized out> #8 main (argc=<optimized out>, argv=<optimized out>) at babeltrace.c:760 ret = <optimized out> partial_error = <optimized out> open_success = <optimized out> fmt_write = 0xff72f4c td_write = 0x100a8b24 ctx = 0x10015f48 i = <optimized out> (gdb) q
Files
Updated by Jérémie Galarneau over 10 years ago
- Status changed from New to Feedback
Could you provide a trace reproducing the crash?
Updated by Jérémie Galarneau over 10 years ago
- Target version set to Babeltrace 1.2
Updated by Mathieu Desnoyers over 10 years ago
It will also be important to have the version information of lttng-ust/modules/tools that produced the trace (specific commits).
Thanks,
Mathieu
Updated by Stanislav Vovk over 10 years ago
- File snapshot.tgz added
- Commit id
ust: 535d0408caee93577c2b4d8ad3cd672fa97eac1d
tools: 5d985544a7ad54afa0514ea7f522a89d8d844860
- Lttng configuration:
root@du1:~# lttng list rcs Tracing session rcs: [active snapshot] Trace path: === Domain: UST global === Buffer type: per UID Channels: ------------- - channel0: [enabled] Attributes: overwrite mode: 1 subbufers size: 131072 number of subbufers: 4 switch timer interval: 0 read timer interval: 0 output: mmap() Events: * (loglevel: TRACE_INFO (6)) (type: tracepoint) [enabled] com_ericsson_triobjif:* (type: tracepoint) [enabled] com_ericsson_trithread:* (type: tracepoint) [enabled]
- Snapshot attached.
Updated by Mathieu Desnoyers over 10 years ago
snapshot file deleted, as requested by Stanislav.
Updated by Mathieu Desnoyers over 10 years ago
Looking at the snapshot output, it appears that some strings are in really bad
shape here, some containing non-printable characters.
I suspect that there are some issues with the instrumentation generating this
trace, where the content of some strings is corrupted. The case that can
trigger this babeltrace assertion is probably when a string (or a memory location
that is believed to hold a string, but actually doesn't) is passed as parameter as
input for a "string" field, and and it is changed concurrently. LTTng-UST reads
the string once, to compute how much space it must reserve within the trace. Then,
it reads the string input a second time, to do the actual copy. If a "\0' appears
earlier in the string, then the size of the string read will be smaller than the
size reserved within the trace, and therefore Babeltrace will try to read the next
field where the final end-of-string bytes are sitting.
There are two things we should do here. First, I should probably use something else
than a '\0' character to pad the strings when the copy is smaller than the reserved
space (this is a small bug in LTTng-UST).
Second, the instrumentation of the application generating this trace should be reviewed
and fixed, since it seems to pass a pointer that is either mistakenly believed to be
a string pointer, or passing a string that is concurrently updated by another thread.
Thanks,
Mathieu
Updated by Mathieu Desnoyers over 10 years ago
- Status changed from Feedback to In Progress
Updated by Mathieu Desnoyers over 10 years ago
- Project changed from Babeltrace to LTTng-UST
Updated by Mathieu Desnoyers over 10 years ago
- File fix-ust-strcpy-with-racy-input.patch fix-ust-strcpy-with-racy-input.patch added
- Status changed from In Progress to Feedback
- Assignee set to Mathieu Desnoyers
Can you try generating a trace from the same application with UST having the following patch applied ?
There will likely still be garbage in your strings, but at least Babeltrace should not assert anymore.
Thanks,
Mathieu
Updated by Mathieu Desnoyers over 10 years ago
- Target version changed from Babeltrace 1.2 to 2.4
Updated by Mathieu Desnoyers over 10 years ago
updated patch so we don't have to change the ABI between probe providers and lttng-ust.
Updated by Mathieu Desnoyers over 10 years ago
- Subject changed from Babeltrace crash when reading a snapshot to LTTng-UST produces unreadable traces when string input from application is concurrently changed
Updated by Stanislav Vovk over 10 years ago
This crash has been seen by couple of our users (of the platform). I have not tried to reproduce this fault locally. To test your patch would require to rebuild ust and deploy that version. In our current product delivery structure this is quite hard, nearly impossible. I do have some ideas how to reproduce it and will try those but can not guarantee that they'll work.
If you can test this patch it would be great.
Thanks,
Stanislav
Updated by Mathieu Desnoyers over 10 years ago
patch cleanup
Updated by Mathieu Desnoyers over 10 years ago
- File race-tp-string.c race-tp-string.c added
- File ust_test.h ust_test.h added
- Status changed from Feedback to In Progress
Test program to reprod issue. Compile with:
gcc -O2 -I./ -ldl -llttng-ust -o race-tp-string race-tp-string.c
Generates a trace that triggers this babeltrace output:
[...]
[17:28:23.714647393] (+?.?????????) thinkos ust_test:mytp: { cpu_id = 1 }, { stringfield = "abcs???abc" }
[error] Event id 2 is outside range.
[error] Reading event failed.
Error printing trace.
Updated by Mathieu Desnoyers over 10 years ago
Confirming that it works fine with fix applied. The babeltrace output of the racy test program looks like:
[...]
[17:26:59.352743579] (+0.000000961) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352744452] (+0.000000873) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc" }
[17:26:59.352745232] (+0.000000780) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfas" }
[17:26:59.352746291] (+0.000001059) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc###############" }
[17:26:59.352747164] (+0.000000873) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfah" }
[17:26:59.352748125] (+0.000000961) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfah" }
[17:26:59.352749079] (+0.000000954) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352749969] (+0.000000890) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfas##########" }
[17:26:59.352750894] (+0.000000925) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352751745] (+0.000000851) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
Updated by Mathieu Desnoyers over 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset a44c74d96a15230cec7c09c770bf7fc745f0df15.