Project

General

Profile

Actions

Bug #471

closed

lttng poorly handles some trace-provider errors

Added by Daniel U. Thibault about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
03/11/2013
Due date:
% Done:

100%

Estimated time:

Description

First I modify two files of the lttng-ust/doc/examples/easy-ust project:

sample.c:

int main(int argc, char **argv)
{
    int i = 0;
    short s[3] = { -1, -1, -1 };

    for (i = 0; i < 2000; i++) {
        tracepoint(sample_component, message, "Hello World", s, 3);
        usleep(10);
    }
    return 0;

sample_component_provider.h:

    TP_ARGS(char *, text, short *, shortvalues, short, length),
    TP_FIELDS(
        ctf_string(message, text)
        ctf_sequence(short, shortseq, shortvalues, short, length)
    )

As far as I can tell the above two changes should correctly produce a sequence of shorts in the UST event payload. It certainly makes correctly. There's probably an error or two lurking in there (I should point out I've previously managed to get sample to generate traces with arrays of longs, so the error is specific to my (mis)use of ctf_sequence), because when I get the lttng session going, this happens:

$ lttng create mylocalsession
Session mylocalsession created.
Traces will be written in /home/username/lttng-traces/mylocalsession-20130311-140014
$ lttng enable-event -u --all
All UST events are enabled in channel channel0
$ lttng start
Tracing started for session mylocalsession

At this point I run ./sample. The application does not complain or segfault or whatever. But the user lttng-consumerd and lttng-sessiond both quit unexpectedly. So when ask for lttng list, I get:

$ lttng -vvv list
Spawning a session daemon
DEBUG1 [8583/8583]: SIGUSR1 caught (in sighandler() at lttng.c:196)
DEBUG2 [8583/8583]: Session name: (null) (in cmd_list() at commands/list.c:718)
DEBUG1 [8583/8583]: LSM cmd type : 13 (in send_session_msg() at lttng-ctl.c:261)
DEBUG1 [8583/8583]: Session count 0 (in list_sessions() at commands/list.c:586)
Currently no available tracing session

The lttng-traces folder contains what looks like a proper trace, but it is corrupt, as babeltrace is unable to open it ('the metadata is empty').

Now, besides what I did wrong in setting up my trace provider (which isn't an LTTng bug, obviously), the problem I'm reporting here is that the failure was completely silent: no errors of any kind were reported by lttng. The daemons just quit.

An error-reporting mechanism is needed. It could be as simple as the "current session" .lttngrc trick: have the session daemon write a "crash report" to .lttngerror or some such (the consumer daemon would simply forward its crash report to the session daemon). The next time the lttng session daemon is started, it would spot the error report file and warn the user about it (e.g. "An error log exists: consult %s" where %s is the .lttngerror fully qualified path).

Actions #1

Updated by Daniel U. Thibault about 11 years ago

The problem lies with the _length_type of the ctf_sequence call. size_t is preferred, unsigned short, unsigned long, and unsigned long long seem to work correctly (hard to be sure without going to extreme lengths), but signed types may or may not work (as in my (signed) short example); when they do work, babeltrace rejects the signed sequence length field. Oddly, a _length_type of char is considered "signed" by babeltrace, even though char is unsigned according to the C language definition.

Beacuse the ctf_ prototypes are macros, at this point it may not be worthwhile to "idiot-proof" the tracepoint provider construction. Proper documentation of the ctf_ prototypes (currently in the lttng-ust man pages) should be enough.

Actions #2

Updated by David Goulet about 11 years ago

  • Assignee set to Mathieu Desnoyers
Actions #3

Updated by Mathieu Desnoyers about 11 years ago

The statement

"Oddly, a _length_type of char is considered "signed" by babeltrace, even though char is unsigned according to the C language definition." is incorrect.

According to ISO/IEC 9899:1999 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), 6.2.5 Types:

"15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.35)"

Therefore, signedness of "char" is implementation-defined (by the compiler).

Actions #4

Updated by Mathieu Desnoyers about 11 years ago

I tried your use-case here:

- The sessiond and consumerd daemons did not fail.

Possible causes on your side: Did you perhaps run "lttng -vvv list" from a different user, or a user not part of the tracing group. Please try again and provide a more detailed log of the commands and currently running programs (proof that sessiond and consumerd were actually killed by e.g. using "ps aux").

- Resulting trace:

With the current babeltrace master HEAD at commit 765f2d8c1c070828aea8ffb242021546fbb922ab, running "lttng view" after "lttng stop" gets me:

"Trace directory: /home/compudj/lttng-traces/auto-20130409-202939

[error] Sequence length field should be unsigned.
[error] Open file stream error.
[warning] [Context] Cannot open_trace of format ctf at path /home/compudj/lttng-traces/auto-20130409-202939/ust/pid/lt-hello-9170-20130409-202946.
[warning] [Context] cannot open trace "/home/compudj/lttng-traces/auto-20130409-202939/ust/pid/lt-hello-9170-20130409-202946" from /home/compudj/lttng-traces/auto-20130409-202939 for reading.
[error] Cannot open any trace for reading.

[error] opening trace "/home/compudj/lttng-traces/auto-20130409-202939" for reading.

[error] none of the specified trace paths could be opened."

Please note the "[error] Sequence length field should be unsigned.", which clearly states the issue.

Actions #5

Updated by Mathieu Desnoyers about 11 years ago

  • Project changed from LTTng-tools to LTTng-UST
Actions #6

Updated by Mathieu Desnoyers about 11 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
Actions #7

Updated by Mathieu Desnoyers about 11 years ago

If you still experience sessiond crash, please open a new bug for that specific problem.

Actions #8

Updated by Daniel U. Thibault about 11 years ago

With babeltrace 1.1.0 (c99b191), lttng-modules 2.1.1+ (090db00), lttng-tools 2.1.1+ (9f9ee9c), lttng-ust 2.1.2+ (e2ad3b4), and userspace-rcu 0.7.6+ (d107390), I get no crash, and babeltrace rejects the trace as in your case.

So I went back to a March-06 virtual machine, and tried with babeltrace 1.0.3+ (9a3bb76), lttng-modules 2.1.1+ (74b9312), lttng-tools 2.1.1+ (e58727c), lttng-ust 2.1.1+ (009769c), and userspace-rcu 0.7.6+ (108a92e). The kernel is 3.2.0-36-virtual (Ubuntu 12.04). The machine was freshly rebooted to make sure lttng was not in a pathological state. And it crashed again, in the same way.

I won't open another bug report, as lttng has since moved beyond whatever was causing this crash. I'll document it here just for historical value.

Trying again, with an lttng-sessiond running verbose from a shell in order to get its error messages. Here's the log as ./sample is run:

DEBUG1 [2633/2637]: UST registration received with pid:2654 ppid:1925 uid:1000 gid:1000 sock:19 name:samplestatic (version 4.0) (in thread_registration_apps() at main.c:1602)
DEBUG1 [2633/2637]: Futex n to 1 wake done (in futex_nto1_wake() at futex.c:104)
DEBUG1 [2633/2637]: Accepting application registration (in thread_registration_apps() at main.c:1506)
DEBUG1 [2633/2636]: Futex n to 1 wait done (in futex_nto1_wait() at futex.c:90)
DEBUG1 [2633/2636]: Futex n to 1 prepare done (in futex_nto1_prepare() at futex.c:75)
DEBUG1 [2633/2636]: Dispatching UST registration pid:2654 ppid:1925 uid:1000 gid:1000 sock:19 name:samplestatic (version 4.0) (in thread_dispatch_ust_registration() at main.c:1318)
DEBUG3 [2633/2636]: UST app creating application for socket 19 (in ust_app_create() at ust-app.c:1895)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0000a00 of type 1 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0000dc0 of type 1 (in lttng_ht_new() at hashtable.c:111)
DEBUG1 [2633/2636]: Woken up but nothing in the UST command queue (in thread_dispatch_ust_registration() at main.c:1306)
DEBUG1 [2633/2637]: UST registration received with pid:2654 ppid:1925 uid:1000 gid:1000 sock:23 name:samplestatic (version 4.0) (in thread_registration_apps() at main.c:1602)
DEBUG1 [2633/2637]: Futex n to 1 wake done (in futex_nto1_wake() at futex.c:104)
DEBUG1 [2633/2637]: Accepting application registration (in thread_registration_apps() at main.c:1506)
DEBUG1 [2633/2636]: Futex n to 1 wait done (in futex_nto1_wait() at futex.c:90)
DEBUG1 [2633/2636]: Futex n to 1 prepare done (in futex_nto1_prepare() at futex.c:75)
DEBUG1 [2633/2636]: Dispatching UST registration pid:2654 ppid:1925 uid:1000 gid:1000 sock:23 name:samplestatic (version 4.0) (in thread_dispatch_ust_registration() at main.c:1318)
DEBUG3 [2633/2636]: UST app notify socket 23 is set (in thread_dispatch_ust_registration() at main.c:1364)
DEBUG1 [2633/2636]: App registered with pid:2654 ppid:1925 uid:1000 gid:1000 sock:19 name:samplestatic notify_sock:23 (version 4.0) (in ust_app_add() at ust-app.c:1983)
DEBUG2 [2633/2636]: UST app global update for app sock 19 for session id 0 (in ust_app_global_update() at ust-app.c:3132)
DEBUG2 [2633/2636]: UST app pid: 2654 session id 0 not found, creating it (in create_ust_app_session() at ust-app.c:1461)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0002280 of type 0 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Append to metadata: "typealias integer { size = 8; align = 8; signed = false; } := uint8_t;
typealias integer { size = 16; align = 8; signed = false; } := uint16_t;
typealias integer { size = 32; align = 8; signed = false; } := uint32_t;
typealias integer { size = 64; align = 8; signed = false; } := uint64_t;
typealias integer { size = 64; align = 8; signed = false; } := unsigned long;
typealias integer { size = 5; align = 1; signed = false; } := uint5_t;
typealias integer { size = 27; align = 1; signed = false; } := uint27_t;

trace {
    major = 1;
    minor = 8;
    uuid = "f0cb8eee-06ff-4eda-82eb-368d8df8c23f";
    byte_order = le;
    packet.header := struct {
        uint32_t magic;
        uint8_t  uuid[16];
        uint32_t stream_id;
    };
};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "env {
    hostname = "edge-vb-u12";
    domain = "ust";
    tracer_name = "lttng-ust";
    tracer_major = 2;
    tracer_minor = 1;
    tracer_patchlevel = 0;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "    vpid = 2654;
    procname = "samplestatic";
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "clock {
    name = monotonic;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "    uuid = "894e7332-fd69-4c2a-802a-efd5d9e7efa3";
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "    description = "Monotonic Clock";
    freq = 1000000000; /* Frequency, in Hz */
    /* clock value offset from Epoch is: offset * (1/freq) */
    offset = 1365615988917721350;
};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "typealias integer {
    size = 27; align = 1; signed = false;
    map = clock.monotonic.value;
} := uint27_clock_monotonic_t;

typealias integer {
    size = 32; align = 8; signed = false;
    map = clock.monotonic.value;
} := uint32_clock_monotonic_t;

typealias integer {
    size = 64; align = 8; signed = false;
    map = clock.monotonic.value;
} := uint64_clock_monotonic_t;

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "struct packet_context {
    uint64_clock_monotonic_t timestamp_begin;
    uint64_clock_monotonic_t timestamp_end;
    uint64_t content_size;
    uint64_t packet_size;
    unsigned long events_discarded;
    uint32_t cpu_id;
};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2636]: Append to metadata: "struct event_header_compact {
    enum : uint5_t { compact = 0 ... 30, extended = 31 } id;
    variant <id> {
        struct {
            uint27_clock_monotonic_t timestamp;
        } compact;
        struct {
            uint32_t id;
            uint64_clock_monotonic_t timestamp;
        } extended;
    } v;
} align(8);

struct event_header_large {
    enum : uint16_t { compact = 0 ... 65534, extended = 65535 } id;
    variant <id> {
        struct {
            uint32_clock_monotonic_t timestamp;
        } compact;
        struct {
            uint32_t id;
            uint64_clock_monotonic_t timestamp;
        } extended;
    } v;
} align(8);

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG2 [2633/2636]: Shadow copy of session handle -1 (in shadow_copy_session() at ust-app.c:1354)
DEBUG2 [2633/2636]: Channel channel0 not found on shadow session copy, creating it (in shadow_copy_session() at ust-app.c:1383)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0004480 of type 1 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0004720 of type 0 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f00049c0 of type 0 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: UST app channel channel0 allocated (in alloc_ust_app_channel() at ust-app.c:678)
DEBUG2 [2633/2636]: UST app shadow copy of channel channel0 started (in shadow_copy_channel() at ust-app.c:1286)
DEBUG2 [2633/2636]: UST event * not found on shadow copy channel (in shadow_copy_channel() at ust-app.c:1321)
DEBUG3 [2633/2636]: UST app event * allocated (in alloc_ust_app_event() at ust-app.c:734)
DEBUG3 [2633/2636]: UST app shadow copy of channel channel0 done (in shadow_copy_channel() at ust-app.c:1331)
DEBUG3 [2633/2639]: UST thread notify added sock 23 to pollset (in ust_thread_manage_notify() at ust-thread.c:118)
DEBUG3 [2633/2639]: [ust-thread] Manage notify polling on 3 fds (in ust_thread_manage_notify() at ust-thread.c:54)
DEBUG2 [2633/2636]: UST app session created successfully with handle 2 (in create_ust_app_session() at ust-app.c:1500)
DEBUG2 [2633/2636]: Asking UST consumer for channel (in ask_channel_creation() at ust-consumer.c:112)
DEBUG3 [2633/2636]: mkdir() recursive /home/daniel/lttng-traces/bug471-20130410-140559//ust/samplestatic-2654-20130410-140619/ with mode 504 for uid 1000 and gid 1000 (in run_as_mkdir_recursive() at runas.c:306)
DEBUG1 [2633/2636]: Using run_as_clone (in run_as() at runas.c:289)
DEBUG2 [2633/2636]: UST ask channel 1 successfully done with 1 stream(s) (in ask_channel_creation() at ust-consumer.c:156)
DEBUG2 [2633/2636]: UST app stream 1 received succesfully (in ust_consumer_get_channel() at ust-consumer.c:266)
DEBUG3 [2633/2636]: UST app consumer has no more stream available (in ust_consumer_get_channel() at ust-consumer.c:249)
DEBUG2 [2633/2636]: UST app send channel to app sock 19 pid 2654 (name: channel0, key: 1) (in ust_consumer_send_channel_to_ust() at ust-consumer.c:370)
DEBUG2 [2633/2636]: UST consumer send stream to app 19 (in ust_consumer_send_stream_to_ust() at ust-consumer.c:335)
DEBUG2 [2633/2636]: UST app event * created successfully for pid:2654 (in create_ust_event() at ust-app.c:1212)
DEBUG1 [2633/2636]: Starting tracing for ust app pid 2654 (in ust_app_start_trace() at ust-app.c:2811)
DEBUG3 [2633/2636]: mkdir() recursive /home/daniel/lttng-traces/bug471-20130410-140559 with mode 504 for uid 1000 and gid 1000 (in run_as_mkdir_recursive() at runas.c:306)
DEBUG1 [2633/2636]: Using run_as_clone (in run_as() at runas.c:289)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f00058d0 of type 1 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0005c90 of type 0 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: Created hashtable size 4 at 0x7f89f0006050 of type 0 (in lttng_ht_new() at hashtable.c:111)
DEBUG3 [2633/2636]: UST app channel metadata allocated (in alloc_ust_app_channel() at ust-app.c:678)
DEBUG2 [2633/2636]: Asking UST consumer for channel (in ask_channel_creation() at ust-consumer.c:112)
DEBUG3 [2633/2636]: mkdir() recursive /home/daniel/lttng-traces/bug471-20130410-140559//ust/samplestatic-2654-20130410-140619/ with mode 504 for uid 1000 and gid 1000 (in run_as_mkdir_recursive() at runas.c:306)
DEBUG1 [2633/2636]: Using run_as_clone (in run_as() at runas.c:289)
DEBUG2 [2633/2636]: UST ask channel 2 successfully done with 1 stream(s) (in ask_channel_creation() at ust-consumer.c:156)
DEBUG2 [2633/2636]: UST consumer setup metadata channel key 2 (in ust_consumer_setup_metadata() at ust-consumer.c:497)
DEBUG2 [2633/2636]: UST metadata created for app pid 2654 (in create_ust_app_metadata() at ust-app.c:1838)
DEBUG3 [2633/2639]: UST app receiving notify from sock 23 (in ust_app_recv_notify() at ust-app.c:3696)
DEBUG2 [2633/2639]: UST app ustctl register channel received (in ust_app_recv_notify() at ust-app.c:3744)
DEBUG3 [2633/2639]: Append to metadata: "stream {
    id = 0;
    event.header := struct event_header_compact;
    packet.context := struct packet_context;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: UST app replying to register channel with id 0, type: 1, ret: 0 (in reply_ust_register_channel() at ust-app.c:3597)
DEBUG3 [2633/2639]: [ust-thread] Manage notify polling on 3 fds (in ust_thread_manage_notify() at ust-thread.c:54)
DEBUG2 [2633/2636]: UST trace started for app pid 2654 (in ust_app_global_update() at ust-app.c:3203)
DEBUG3 [2633/2639]: UST app receiving notify from sock 23 (in ust_app_recv_notify() at ust-app.c:3696)
DEBUG2 [2633/2639]: UST app ustctl register event received (in ust_app_recv_notify() at ust-app.c:3716)
DEBUG3 [2633/2639]: UST registry creating event with event: sample_component:message, sig: char *, text, short *, shortvalues, short, length, id: 0, chan_objd: 3, sess_objd: 2 (in ust_registry_create_event() at ust-registry.c:208)
DEBUG3 [2633/2639]: Append to metadata: "event {
    name = "sample_component:message";
    id = 0;
    stream_id = 0;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "    loglevel = 4;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "    fields := struct {
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "        string _message;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "        integer { size = 16; align = 8; signed = 1; encoding = none; base = 10; } __shortseq_length;
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "        integer { size = 16; align = 8; signed = 1; encoding = none; base = 10; } _shortseq[ __shortseq_length ];
" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: Append to metadata: "    };
};

" (in lttng_metadata_printf() at ust-metadata.c:144)
DEBUG3 [2633/2639]: UST registry event sample_component:message has been added successfully (in add_event_ust_registry() at ust-app.c:3677)
Segmentation fault (core dumped)

Actions

Also available in: Atom PDF