Project

General

Profile

Bug #734

LTTng-UST produces unreadable traces when string input from application is concurrently changed

Added by Stanislav Vovk about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
Target version:
Start date:
02/18/2014
Due date:
% Done:

100%

Estimated time:

Description

Commit:
babeltrace 87353f23d5107c8cc09719d46cc8ccee5da989f7 (v1.1.1)

Problem:
Command used when crash happend, directory points to a folder containing a snapshot.

babeltrace --clock-date -n none /trace/telog

crash occurrence is not regular, happens rarely. Following is printed by shell at moment of the crash.
babeltrace: ../../include/babeltrace/ctf/types.h:224: ctf_pos_get_event: Assertion `pos->offset <= pos->content_size' failed.

Backtrace:

(gdb) bt full
#0  0x0fbadaa0 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x0fbb2930 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x0fba599c in ?? () from /lib/libc.so.6
No symbol table info available.
#3  0x0fba5a54 in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#4  0x0ff8b274 in ?? () from /usr/lib/libbabeltrace-ctf.so.0
No symbol table info available.
#5  0x0ffd4edc in ?? () from /usr/lib/libbabeltrace.so.0
No symbol table info available.
#6  0x0ffd6110 in bt_iter_next () from /usr/lib/libbabeltrace.so.0
No symbol table info available.
#7  0x10001ff0 in convert_trace (ctx=0x10015f48, td_write=0x100a8b24) at babeltrace.c:633
        iter = 0x100a7550
        ret = <optimized out>
        sout = 0x100a8b10
        begin_pos = {type = BT_SEEK_BEGIN, u = {seek_time = 1141231514160201729, restore = 0xfd67804}}
        ctf_event = <optimized out>
#8  main (argc=<optimized out>, argv=<optimized out>) at babeltrace.c:760
        ret = <optimized out>
        partial_error = <optimized out>
        open_success = <optimized out>
        fmt_write = 0xff72f4c
        td_write = 0x100a8b24
        ctx = 0x10015f48
        i = <optimized out>
(gdb) q  


Files

fix-ust-strcpy-with-racy-input.patch (7.12 KB) fix-ust-strcpy-with-racy-input.patch Mathieu Desnoyers, 04/14/2014 11:19 AM
fix-ust-strcpy-with-racy-input-2.patch (7.56 KB) fix-ust-strcpy-with-racy-input-2.patch Mathieu Desnoyers, 04/14/2014 03:17 PM
fix-ust-strcpy-with-racy-input-3.patch (7.56 KB) fix-ust-strcpy-with-racy-input-3.patch Mathieu Desnoyers, 04/15/2014 01:01 PM
race-tp-string.c (1.67 KB) race-tp-string.c Mathieu Desnoyers, 04/16/2014 05:30 PM
ust_test.h (1.64 KB) ust_test.h Mathieu Desnoyers, 04/16/2014 05:30 PM
#1

Updated by Jérémie Galarneau about 6 years ago

  • Status changed from New to Feedback

Could you provide a trace reproducing the crash?

#2

Updated by Jérémie Galarneau about 6 years ago

  • Target version set to Babeltrace 1.2
#3

Updated by Mathieu Desnoyers about 6 years ago

It will also be important to have the version information of lttng-ust/modules/tools that produced the trace (specific commits).

Thanks,

Mathieu

#4

Updated by Stanislav Vovk almost 6 years ago

  • File snapshot.tgz added
  • Commit id
    ust: 535d0408caee93577c2b4d8ad3cd672fa97eac1d
    tools: 5d985544a7ad54afa0514ea7f522a89d8d844860
  • Lttng configuration:
    root@du1:~# lttng list rcs
    Tracing session rcs: [active snapshot]
        Trace path: 
    
    === Domain: UST global ===
    
    Buffer type: per UID
    
    Channels:
    -------------
    - channel0: [enabled]
    
        Attributes:
          overwrite mode: 1
          subbufers size: 131072
          number of subbufers: 4
          switch timer interval: 0
          read timer interval: 0
          output: mmap()
    
        Events:
          * (loglevel: TRACE_INFO (6)) (type: tracepoint) [enabled]
          com_ericsson_triobjif:* (type: tracepoint) [enabled]
          com_ericsson_trithread:* (type: tracepoint) [enabled]
    
  • Snapshot attached.
#5

Updated by Mathieu Desnoyers almost 6 years ago

  • File deleted (snapshot.tgz)
#6

Updated by Mathieu Desnoyers almost 6 years ago

snapshot file deleted, as requested by Stanislav.

#7

Updated by Mathieu Desnoyers almost 6 years ago

Looking at the snapshot output, it appears that some strings are in really bad
shape here, some containing non-printable characters.

I suspect that there are some issues with the instrumentation generating this
trace, where the content of some strings is corrupted. The case that can
trigger this babeltrace assertion is probably when a string (or a memory location
that is believed to hold a string, but actually doesn't) is passed as parameter as
input for a "string" field, and and it is changed concurrently. LTTng-UST reads
the string once, to compute how much space it must reserve within the trace. Then,
it reads the string input a second time, to do the actual copy. If a "\0' appears
earlier in the string, then the size of the string read will be smaller than the
size reserved within the trace, and therefore Babeltrace will try to read the next
field where the final end-of-string bytes are sitting.

There are two things we should do here. First, I should probably use something else
than a '\0' character to pad the strings when the copy is smaller than the reserved
space (this is a small bug in LTTng-UST).

Second, the instrumentation of the application generating this trace should be reviewed
and fixed, since it seems to pass a pointer that is either mistakenly believed to be
a string pointer, or passing a string that is concurrently updated by another thread.

Thanks,

Mathieu

#8

Updated by Mathieu Desnoyers almost 6 years ago

  • Status changed from Feedback to In Progress
#9

Updated by Mathieu Desnoyers almost 6 years ago

  • Project changed from Babeltrace to LTTng-UST
#10

Updated by Mathieu Desnoyers almost 6 years ago

Can you try generating a trace from the same application with UST having the following patch applied ?

There will likely still be garbage in your strings, but at least Babeltrace should not assert anymore.

Thanks,

Mathieu

#11

Updated by Mathieu Desnoyers almost 6 years ago

  • Target version changed from Babeltrace 1.2 to 2.4
#12

Updated by Mathieu Desnoyers almost 6 years ago

updated patch so we don't have to change the ABI between probe providers and lttng-ust.

#13

Updated by Mathieu Desnoyers almost 6 years ago

  • Subject changed from Babeltrace crash when reading a snapshot to LTTng-UST produces unreadable traces when string input from application is concurrently changed
#14

Updated by Stanislav Vovk almost 6 years ago

This crash has been seen by couple of our users (of the platform). I have not tried to reproduce this fault locally. To test your patch would require to rebuild ust and deploy that version. In our current product delivery structure this is quite hard, nearly impossible. I do have some ideas how to reproduce it and will try those but can not guarantee that they'll work.
If you can test this patch it would be great.

Thanks,
Stanislav

#16

Updated by Mathieu Desnoyers almost 6 years ago

Test program to reprod issue. Compile with:

gcc -O2 -I./ -ldl -llttng-ust -o race-tp-string race-tp-string.c

Generates a trace that triggers this babeltrace output:

[...]
[17:28:23.714647393] (+?.?????????) thinkos ust_test:mytp: { cpu_id = 1 }, { stringfield = "abcs???abc" }
[error] Event id 2 is outside range.
[error] Reading event failed.
Error printing trace.

#17

Updated by Mathieu Desnoyers almost 6 years ago

Confirming that it works fine with fix applied. The babeltrace output of the racy test program looks like:

[...]
[17:26:59.352743579] (+0.000000961) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352744452] (+0.000000873) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc" }
[17:26:59.352745232] (+0.000000780) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfas" }
[17:26:59.352746291] (+0.000001059) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc###############" }
[17:26:59.352747164] (+0.000000873) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfah" }
[17:26:59.352748125] (+0.000000961) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfah" }
[17:26:59.352749079] (+0.000000954) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352749969] (+0.000000890) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abcsdfas##########" }
[17:26:59.352750894] (+0.000000925) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }
[17:26:59.352751745] (+0.000000851) thinkos ust_test:mytp: { cpu_id = 0 }, { stringfield = "abc#####" }

#18

Updated by Mathieu Desnoyers almost 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF