Project

General

Profile

Bug #1168

CTF for a user-plugged clock assumes that "offset" is unsigned, but a negative value can happen and will result in unparsable metadata getting stored

Added by Vlad Roubtsov about 1 year ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
05/31/2018
Due date:
% Done:

0%

Estimated time:

Description

Hi,

I need to switch LTTng timestamp source to CLOCK_REALTIME from CLOCK_MONOTONIC in order to correlate with hardware-generated timestamps in a NIC. My initial attempt was easy enough to get going as a clock plugin:

static uint64_t
my_clock_time ()
{
    struct timespec t;
    clock_gettime (CLOCK_REALTIME, & t);

    return (t.tv_sec * 1000000000UL + t.tv_nsec);
}

static uint64_t
my_clock_freq ()
{
    return 1000000000UL;
}

(the rest follows the "clock-override" example). However, this results in sporadic errors opening the trace later via babeltrace:

[error] ctf_clock_declaration_visit: unexpected unary expression for clock offset
[error] ctf_visitor_construct_metadata: clock declaration error
[error] Error in CTF metadata constructor -22
[warning] Unable to open trace metadata for path "/.../ust/pid/...-115227-20180601-062437".
[warning] [Context] Cannot open_trace of format ctf at path /...-115227-20180601-062437.

It appears that this happens when LTTng calculates the clock's "epoch offset" (which in theory should be 0 above) and it comes out as a small negative number. To reproduce this every time, try shifting the my_clock_value() result:

static uint64_t
my_clock_time ()
{
    struct timespec t;
    clock_gettime (CLOCK_REALTIME, & t);

    return (t.tv_sec * 1000000000UL + t.tv_nsec + 1000);
}

Peeking inside the persisted "metadata" you will see something like this:

>hexdump -C /...-115227-20180601-062437/metadata | less
...
00000440  4c 54 49 4d 45 22 3b 0a  09 66 72 65 71 20 3d 20  |LTIME";..freq = |
00000450  31 30 30 30 30 30 30 30  30 30 3b 20 2f 2a 20 46  |1000000000; /* F|
00000460  72 65 71 75 65 6e 63 79  2c 20 69 6e 20 48 7a 20  |requency, in Hz |
00000470  2a 2f 0a 09 2f 2a 20 63  6c 6f 63 6b 20 76 61 6c  |*/../* clock val|
00000480  75 65 20 6f 66 66 73 65  74 20 66 72 6f 6d 20 45  |ue offset from E|
00000490  70 6f 63 68 20 69 73 3a  20 6f 66 66 73 65 74 20  |poch is: offset |
000004a0  2a 20 28 31 2f 66 72 65  71 29 20 2a 2f 0a 09 6f  |* (1/freq) */..o|
000004b0  66 66 73 65 74 20 3d 20  2d 31 30 30 31 3b 0a 7d  |ffset = -1001;.}| <- negative value stored as "signed = false" int?
000004c0  3b 0a 0a 74 79 70 65 61  6c 69 61 73 20 69 6e 74  |;..typealias int|
000004d0  65 67 65 72 20 7b 0a 09  73 69 7a 65 20 3d 20 32  |eger {..size = 2|
000004e0  37 3b 20 61 6c 69 67 6e  20 3d 20 31 3b 20 73 69  |7; align = 1; si|
000004f0  67 6e 65 64 20 3d 20 66  61 6c 73 65 3b 0a 09 6d  |gned = false;..m|

My env details (both lttng and babeltrace were built from stable sources):

>lttng --version
lttng (LTTng Trace Control) 2.10.4 - KeKriek

>lttng-sessiond --version
2.10.4

>babeltrace --version
BabelTrace Trace Viewer and Converter 1.5.5

>uname -a
Linux ... 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>cat /etc/system-release
CentOS Linux release 7.4.1708 (Core)

>nproc
16

History

#1

Updated by Mathieu Desnoyers about 1 year ago

  • Project changed from LTTng-UST to Babeltrace

Moving this issue to the babeltrace project.

#2

Updated by Mathieu Desnoyers about 1 year ago

Babeltrace 2.0-pre4 contains commits that handle negative clock offsets properly:

7a5f95fa Python bindings: accomodate API changes introduced for negative time
61cf588b Handle negative time and offset from Epoch

However, they don't appear to be cherry-picked into babeltrace stable-1.5. I remember the reason being that it required an API change.

This would explain why your babeltrace 1.5.5 has this issue. Can you try with Babeltrace 2.0-pre4 and report whether it fixes the issue on your side ?

Thanks,

Mathieu

#3

Updated by Vlad Roubtsov about 1 year ago

Right. I've built babeltrace-2.0.0-pre4 (BTW, this required upgrading libpopt to v1.16, otherwise 'configure' is fine but 'make' breaks), but now my code that used to read traces is broken: #include <babeltrace/ctf/iterator.h> is gone and I am not sure how to proceed.

Trying to navigate 2.0 docs in the repo but this is going slowly... Is there a simple example of opening a trace and iterating sequentially through the events therein, ideally with dropped events/packets callbacks or some other analogue of v1.x bt_ctf_get_lost_events_count()? Or a unit test that does some basic CTF trace reading?

#4

Updated by Vlad Roubtsov 12 months ago

These v2.0 hurdles (new API, no docs, no examples) prevented us from confirming that this issue is gone in v2.0. I asked for help both via lttng-dev and IRC (and got some from Jeremie Galarneau). A developer on my team made some progress hacking plugins/text/pretty/pretty.c but we had to abandon the effort: the benefit of seeing this issue fixed is not worth the effort of migrating to 2.0 without proper documentation. So, we won't be upgrading to 2.0.0-preX in near term regardless.

FWIW, Jeremie also thought that the issue should not be present in v2.0.

Also available in: Atom PDF