Project

General

Profile

Actions

Bug #561

open

Under certain conditions, a user-space trace may overwrite itself

Added by Daniel U. Thibault almost 11 years ago. Updated over 8 years ago.

Status:
Confirmed
Priority:
Low
Assignee:
-
Target version:
Start date:
06/13/2013
Due date:
% Done:

0%

Estimated time:

Description

Suppose we do this:

$ sudo -H lttng create asession
$ sudo -H lttng enable-event -a -u
$ sudo -H lttng start

And suppose we have an application that has been instrumented with some user-space tracepoint provider. Suppose the application's main loop is something like this (borrowed from easy-ust):

int main(int argc, char **argv)
{
    int i = 0;
    char themessage[20]; //Can hold up to "Hello World 9999999\0" 
    void *libtp_handle;

    libtp_handle = dlopen("./libtp.so", RTLD_LAZY);

    fprintf(stderr, "sample starting\n");
    for (i = 0; i < 10000; i++) {
        if ((i == 3333) && (libtp_handle)) dlclose(libtp_handle);
        if (i == 6666) libtp_handle = dlopen("./libtp.so", RTLD_LAZY);
        sprintf(themessage, "Hello World %u", i);
        tracepoint(sample_component, event, themessage);
        usleep(1);
    }
    fprintf(stderr, "sample done\n");

    if (libtp_handle) return dlclose(libtp_handle);
    return 0;
}

The trace produced will capture two separate processes: the first one for the app's first 3333 loops, the second for the app's last 3333 loops. This is because the app will register itself as a user-space event source, then withdraw its registration only to later re-register.

As it happens, most of the time the two third-runs will be a second apart, resulting in the trace holding two pid subdirectories: say sample-17541-20130613-135148 and sample-17541-20130613-135149. But now and again both processes will be within the same one-second window, and the trace will thus contain only one pid subdirectory, say sample-17541-20130613-135151 ---the problem is that the app's first 3333 loops were written to disk and then the last 3333 loops were written to the same file.

This only gets worse if the dlopen/dlclose calls are more tightly packed in time.

The bug boils down to this: once a tracing session detects a new process client, lttng should detect path collisions and correct for them. One solution would be to have a trace's path be:

tracepath/ust/pid/process_name-VPID-yyyymmdd-hhmmss[-n]/

In my example, the first 3333 loops would go to tracepath/ust/pid/sample-17541-20130613-135151 and the last 3333 loops to tracepath/ust/pid/sample-17541-20130613-135151-1

I suppose a similar problem can happen with per-uid traces.

Actions

Also available in: Atom PDF