Bug #1168
openCTF for a user-plugged clock assumes that "offset" is unsigned, but a negative value can happen and will result in unparsable metadata getting stored
0%
Description
Hi,
I need to switch LTTng timestamp source to CLOCK_REALTIME from CLOCK_MONOTONIC in order to correlate with hardware-generated timestamps in a NIC. My initial attempt was easy enough to get going as a clock plugin:
static uint64_t
my_clock_time ()
{
struct timespec t;
clock_gettime (CLOCK_REALTIME, & t);
return (t.tv_sec * 1000000000UL + t.tv_nsec);
}
static uint64_t
my_clock_freq ()
{
return 1000000000UL;
}
(the rest follows the "clock-override" example). However, this results in sporadic errors opening the trace later via babeltrace:
[error] ctf_clock_declaration_visit: unexpected unary expression for clock offset [error] ctf_visitor_construct_metadata: clock declaration error [error] Error in CTF metadata constructor -22 [warning] Unable to open trace metadata for path "/.../ust/pid/...-115227-20180601-062437". [warning] [Context] Cannot open_trace of format ctf at path /...-115227-20180601-062437.
It appears that this happens when LTTng calculates the clock's "epoch offset" (which in theory should be 0 above) and it comes out as a small negative number. To reproduce this every time, try shifting the my_clock_value()
result:
static uint64_t
my_clock_time ()
{
struct timespec t;
clock_gettime (CLOCK_REALTIME, & t);
return (t.tv_sec * 1000000000UL + t.tv_nsec + 1000);
}
Peeking inside the persisted "metadata" you will see something like this:
>hexdump -C /...-115227-20180601-062437/metadata | less ... 00000440 4c 54 49 4d 45 22 3b 0a 09 66 72 65 71 20 3d 20 |LTIME";..freq = | 00000450 31 30 30 30 30 30 30 30 30 30 3b 20 2f 2a 20 46 |1000000000; /* F| 00000460 72 65 71 75 65 6e 63 79 2c 20 69 6e 20 48 7a 20 |requency, in Hz | 00000470 2a 2f 0a 09 2f 2a 20 63 6c 6f 63 6b 20 76 61 6c |*/../* clock val| 00000480 75 65 20 6f 66 66 73 65 74 20 66 72 6f 6d 20 45 |ue offset from E| 00000490 70 6f 63 68 20 69 73 3a 20 6f 66 66 73 65 74 20 |poch is: offset | 000004a0 2a 20 28 31 2f 66 72 65 71 29 20 2a 2f 0a 09 6f |* (1/freq) */..o| 000004b0 66 66 73 65 74 20 3d 20 2d 31 30 30 31 3b 0a 7d |ffset = -1001;.}| <- negative value stored as "signed = false" int? 000004c0 3b 0a 0a 74 79 70 65 61 6c 69 61 73 20 69 6e 74 |;..typealias int| 000004d0 65 67 65 72 20 7b 0a 09 73 69 7a 65 20 3d 20 32 |eger {..size = 2| 000004e0 37 3b 20 61 6c 69 67 6e 20 3d 20 31 3b 20 73 69 |7; align = 1; si| 000004f0 67 6e 65 64 20 3d 20 66 61 6c 73 65 3b 0a 09 6d |gned = false;..m|
My env details (both lttng and babeltrace were built from stable sources):
>lttng --version lttng (LTTng Trace Control) 2.10.4 - KeKriek >lttng-sessiond --version 2.10.4 >babeltrace --version BabelTrace Trace Viewer and Converter 1.5.5 >uname -a Linux ... 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >cat /etc/system-release CentOS Linux release 7.4.1708 (Core) >nproc 16
Updated by Mathieu Desnoyers over 6 years ago
- Project changed from LTTng-UST to Babeltrace
Moving this issue to the babeltrace project.
Updated by Mathieu Desnoyers over 6 years ago
Babeltrace 2.0-pre4 contains commits that handle negative clock offsets properly:
7a5f95fa Python bindings: accomodate API changes introduced for negative time
61cf588b Handle negative time and offset from Epoch
However, they don't appear to be cherry-picked into babeltrace stable-1.5. I remember the reason being that it required an API change.
This would explain why your babeltrace 1.5.5 has this issue. Can you try with Babeltrace 2.0-pre4 and report whether it fixes the issue on your side ?
Thanks,
Mathieu
Updated by Vlad Roubtsov over 6 years ago
Right. I've built babeltrace-2.0.0-pre4 (BTW, this required upgrading libpopt to v1.16, otherwise 'configure' is fine but 'make' breaks), but now my code that used to read traces is broken: #include <babeltrace/ctf/iterator.h> is gone and I am not sure how to proceed.
Trying to navigate 2.0 docs in the repo but this is going slowly... Is there a simple example of opening a trace and iterating sequentially through the events therein, ideally with dropped events/packets callbacks or some other analogue of v1.x bt_ctf_get_lost_events_count()? Or a unit test that does some basic CTF trace reading?
Updated by Vlad Roubtsov over 6 years ago
These v2.0 hurdles (new API, no docs, no examples) prevented us from confirming that this issue is gone in v2.0. I asked for help both via lttng-dev and IRC (and got some from Jeremie Galarneau). A developer on my team made some progress hacking plugins/text/pretty/pretty.c but we had to abandon the effort: the benefit of seeing this issue fixed is not worth the effort of migrating to 2.0 without proper documentation. So, we won't be upgrading to 2.0.0-preX in near term regardless.
FWIW, Jeremie also thought that the issue should not be present in v2.0.
Updated by Jonathan Rajotte Julien almost 5 years ago
- Status changed from New to Feedback
Hi,
Babletrace was officially released [1] with proper documentation2.
Hope this helps.
[1] https://babeltrace.org/docs/release-notes/babeltrace-2.0.0-release-notes.html