Project

General

Profile

Actions

Bug #1299

open

babeltrace2 re-uses prior event string rather than expected empty string

Added by Mathieu Desnoyers 9 months ago. Updated 9 months ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
03/04/2021
Due date:
% Done:

90%

Estimated time:

Description

Trying to reproduce the page fault scenario for an openat() event. Note that the behavior of generating a page fault may be toolchain-specific (linker-specific):

Test program "do-pagefault-open":

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

const char *str = "/tmp/blah";
const char *str2 = "/tmp/blah2";

int main() {
open(str, O_RDWR);
perror("first");
open(str2, O_RDWR);
perror("second");
return 0;
}

then trace with:

lttng-trace ./do-pagefault-open

Relevant events with babeltrace2 (grepping for 'openat'):

[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }
[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296960451] (+0.000006223) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }

Same trace with babeltrace 1:

[warning] Unknown value 0 in enum.
[warning] Unknown value 0 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 5 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 3 in enum.
[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[warning] Unknown value 0 in enum.
[warning] Unknown value 0 in enum.
[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }
[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }

Notice how filename = "/lib/x86_64-linux-gnu/libc.so.6" appears twice in the bt2 output, where we would expect the empty string (as provided by bt1).

Tested with babeltrace2 commit c429f86d0.


Files

Actions #2

Updated by Simon Marchi 9 months ago

This is because bt_string_field_clear does not actually clear the string field's data, it just sets the length to 0:

https://github.com/efficios/babeltrace/blob/776a2a252c9875caa1e8b4f41cb8cc12c79611c3/src/lib/trace-ir/field.c#L852-L859

And the getter just returns a pointer to the data:

https://github.com/efficios/babeltrace/blob/776a2a252c9875caa1e8b4f41cb8cc12c79611c3/src/lib/trace-ir/field.c#L830-L838

So if you used a string field for something before and clear it, the getter will still return the previous string (instead of an empty string, as we would expect).

The context where this goes wrong is here:

https://github.com/efficios/babeltrace/blob/776a2a252c9875caa1e8b4f41cb8cc12c79611c3/src/plugins/ctf/common/msg-iter/msg-iter.c#L2132-L2202

For a string field that is empty, the string begin callback is called (in which we call bt_string_field_clear) but not the string callback (in which we append characters). The fact that bt_string_field_clear is buggy is normally hidden because calling bt_field_string_append_with_length in the string callback fixes things up. But since the string callback is not called for the empty string field, the buggy behavior shows up.

The obvious fix would be to do

string_field->buf->data[0] = '\0';

in bt_field_string_clear.

Actions #3

Updated by Philippe Proulx 9 months ago

  • Status changed from New to Confirmed
  • % Done changed from 0 to 90

Thank you for the reinvestigation Simon.

You commented several times on a fix for this bug submitted on 9 December 2020: https://review.lttng.org/c/babeltrace/+/4528.

More context: https://github.com/efficios/babeltrace/pull/116.

What remains to be done is to add a test, which I started to write, but got busy with something else and never finished (it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).

Actions #4

Updated by Jérémie Galarneau 9 months ago

Philippe Proulx wrote in #note-3:

(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).

In what sense?

Actions #5

Updated by Philippe Proulx 9 months ago

Jérémie Galarneau wrote in #note-4:

Philippe Proulx wrote in #note-3:

(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).

In what sense?

You need an event/packet to get a field.

You need an event/packet message to get an event/packet.

You need an event class or stream to create an event/packet message.

You need a stream class and a trace to create a stream.

You also need a self message iterator to create any message.

You need a trace class to create an event class, stream class, and a trace.

You need a self component to create a trace class.

You need a component class to create a component/message iterator within a graph which you also need.

So basically you need almost all the library objects to create and test that tiny string field.

I started writing some framework for easy field testing, but as I wrote earlier, I'm not done yet.

It's not really something we can test with the Python bindings unfortunately because there's no direct mapping with bt_string_field_clear(). 😢

Actions #6

Updated by Jérémie Galarneau 9 months ago

Thanks for the explanation.

Actions #7

Updated by Mathieu Desnoyers 9 months ago

Philippe Proulx wrote in #note-5:

Jérémie Galarneau wrote in #note-4:

Philippe Proulx wrote in #note-3:

(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).

In what sense?

You need an event/packet to get a field.

You need an event/packet message to get an event/packet.

You need an event class or stream to create an event/packet message.

You need a stream class and a trace to create a stream.

You also need a self message iterator to create any message.

You need a trace class to create an event class, stream class, and a trace.

You need a self component to create a trace class.

You need a component class to create a component/message iterator within a graph which you also need.

So basically you need almost all the library objects to create and test that tiny string field.

I started writing some framework for easy field testing, but as I wrote earlier, I'm not done yet.

It's not really something we can test with the Python bindings unfortunately because there's no direct mapping with bt_string_field_clear(). 😢

I understand that unit-testing this requires a lot of boilerplate code, but doing a test at a higher-level (using a targeted trace) seems like a rather quick way to test for known-problematic trace patterns.

Practically speaking, this could be achieved today by pulling my test trace (or a subset of it) into the project tests, and testing for the problematic pattern against what should be expected.

This is similar to what the ctf-testsuite aims to achieve, but AFAIR it only checks for trace reader crash.

Thanks,

Mathieu

Actions

Also available in: Atom PDF