Bug #1299
openbabeltrace2 re-uses prior event string rather than expected empty string
90%
Description
Trying to reproduce the page fault scenario for an openat() event. Note that the behavior of generating a page fault may be toolchain-specific (linker-specific):
Test program "do-pagefault-open":
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
const char *str = "/tmp/blah";
const char *str2 = "/tmp/blah2";
int main()
{
open(str, O_RDWR);
perror("first");
open(str2, O_RDWR);
perror("second");
return 0;
}
then trace with:
lttng-trace ./do-pagefault-open
Relevant events with babeltrace2 (grepping for 'openat'):
[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }
[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296960451] (+0.000006223) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }
Same trace with babeltrace 1:
[warning] Unknown value 0 in enum.
[warning] Unknown value 0 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 5 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 129 in enum.
[warning] Unknown value 3 in enum.
[warning] Unknown value 3 in enum.
[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }
[warning] Unknown value 0 in enum.
[warning] Unknown value 0 in enum.
[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }
[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }
Notice how filename = "/lib/x86_64-linux-gnu/libc.so.6" appears twice in the bt2 output, where we would expect the empty string (as provided by bt1).
Tested with babeltrace2 commit c429f86d0.
Files
Updated by Mathieu Desnoyers almost 4 years ago
Updated by Simon Marchi almost 4 years ago
This is because bt_string_field_clear
does not actually clear the string field's data, it just sets the length to 0:
And the getter just returns a pointer to the data:
So if you used a string field for something before and clear it, the getter will still return the previous string (instead of an empty string, as we would expect).
The context where this goes wrong is here:
For a string field that is empty, the string begin callback is called (in which we call bt_string_field_clear
) but not the string callback (in which we append characters). The fact that bt_string_field_clear
is buggy is normally hidden because calling bt_field_string_append_with_length
in the string callback fixes things up. But since the string callback is not called for the empty string field, the buggy behavior shows up.
The obvious fix would be to do
string_field->buf->data[0] = '\0';
in bt_field_string_clear
.
Updated by Philippe Proulx almost 4 years ago
- Status changed from New to Confirmed
- % Done changed from 0 to 90
Thank you for the reinvestigation Simon.
You commented several times on a fix for this bug submitted on 9 December 2020: https://review.lttng.org/c/babeltrace/+/4528.
More context: https://github.com/efficios/babeltrace/pull/116.
What remains to be done is to add a test, which I started to write, but got busy with something else and never finished (it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).
Updated by Jérémie Galarneau almost 4 years ago
Philippe Proulx wrote in #note-3:
(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).
In what sense?
Updated by Philippe Proulx almost 4 years ago
Jérémie Galarneau wrote in #note-4:
Philippe Proulx wrote in #note-3:
(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).
In what sense?
You need an event/packet to get a field.
You need an event/packet message to get an event/packet.
You need an event class or stream to create an event/packet message.
You need a stream class and a trace to create a stream.
You also need a self message iterator to create any message.
You need a trace class to create an event class, stream class, and a trace.
You need a self component to create a trace class.
You need a component class to create a component/message iterator within a graph which you also need.
So basically you need almost all the library objects to create and test that tiny string field.
I started writing some framework for easy field testing, but as I wrote earlier, I'm not done yet.
It's not really something we can test with the Python bindings unfortunately because there's no direct mapping with bt_string_field_clear()
. 😢
Updated by Mathieu Desnoyers almost 4 years ago
Philippe Proulx wrote in #note-5:
Jérémie Galarneau wrote in #note-4:
Philippe Proulx wrote in #note-3:
(it's not super simple with libbabeltrace2 to obtain a string field to test this explicitly).
In what sense?
You need an event/packet to get a field.
You need an event/packet message to get an event/packet.
You need an event class or stream to create an event/packet message.
You need a stream class and a trace to create a stream.
You also need a self message iterator to create any message.
You need a trace class to create an event class, stream class, and a trace.
You need a self component to create a trace class.
You need a component class to create a component/message iterator within a graph which you also need.
So basically you need almost all the library objects to create and test that tiny string field.
I started writing some framework for easy field testing, but as I wrote earlier, I'm not done yet.
It's not really something we can test with the Python bindings unfortunately because there's no direct mapping with
bt_string_field_clear()
. 😢
I understand that unit-testing this requires a lot of boilerplate code, but doing a test at a higher-level (using a targeted trace) seems like a rather quick way to test for known-problematic trace patterns.
Practically speaking, this could be achieved today by pulling my test trace (or a subset of it) into the project tests, and testing for the problematic pattern against what should be expected.
This is similar to what the ctf-testsuite aims to achieve, but AFAIR it only checks for trace reader crash.
Thanks,
Mathieu