Babeltrace2 fails to process trace with begin timestamp of second packet smaller than begin timestamp of first packet
I have a CTF encoded trace with two packets, each with two events. Packets are ordered in the file as pktA, pktB.
The timestamps in the packet header are such that pktB.begin < pktA.begin < pktA.end < pktB.end. Given this data in a single input file, babeltrace1 will display it, babeltrace2 refuses, and TraceCompass refuses. However if the packets are separated into separate files, both babeltrace2 and TraceCompass process it as expected.
Updated by Jérémie Galarneau almost 3 years ago
In the CTF model, a clock's value is updated as fields mapped to that clock's value attribute are read. The resulting value of the clock is expected to be monotonically increasing. When an N-bit update would produce a smaller N-bit value, an N-bit overflow is assumed to have occurred. The exception is the timestamp_end field which updates a clock's value at the completion of the decoding of a packet.
Here, a compliant reader will assume that an overflow occurred between pktA.end and pktB.begin or, for that matter, any two updates which causes the N-bit value of the clock to be smaller than the previous N-bit value.
From the specification:
For a N-bit integer type referring to a clock, if the integer overflows compared to the N low order bits of the clock prior value found in the same stream, then it is assumed that one, and only one, overflow occurred.
Of course, this is legal, but only you know if this is the intended effect. Your traces all reference the clock value through 8-bit integers, but I am assuming this is only for the sake of the example.
The reason why splitting the traces in separate physical streams works is that the two physical streams will be heap-sorted to form a single logical stream. That logical stream will present the packets in the expected (pktB, pktA) order to the reader.
If the timestamps of the output of babeltrace 1.x are not monotonically increasing, it points to a bug in babeltrace 1.x.
As for Babeltrace 2, the assertion is not the way this should be handled.
The specification is mute on this, but historically the "timestamp_begin" and "timestamp_end" have been assumed to contain the complete clock value and not to be an N-bit update to the value. In that case, that would make the stream invalid if any event of the second packet occurs before the first packet's final event.
The specification should explicitly mention whether or not timestamp_begin/timestamp_end override or update the clock's value and if a clock's value must be monotonically increasing.
Updated by Geneviève Bastien almost 3 years ago
We have a related bug for another trace, generated from a custom tracer.
With babeltrace 1, the trace is read fine, event timestamps are monotonic. With babeltrace2 it aborts with message
04-22 16:58:59.461 97893 97893 E PLUGIN/FLT.UTILS.MUXER email@example.com:1079 [muxer] Youngest upstream message iterator wrapper's timestamp is less than muxer's message iterator's last returned timestamp: muxer-msg-iter-addr=0x55b2214faf20, ts=1575594527642470600, last-returned-ts=1575594527642472275
Some investigation shows that the reason for this is that end time of a packet > start time of next packet in a stream (pktB.begin < pktA.end)
However, from the spec, it would seem that this case should be supported:
Time-stamp at the beginning and timestamp at the end of the event packet. Both timestamps are written in the packet header, but sampled respectively while (or before) writing the first event and while (or after) writing the last event in the packet. The inclusive range between these timestamps should include all event timestamps assigned to events contained within the packet. The timestamp at the beginning of an event packet is guaranteed to be below or equal the timestamp at the end of that event packet. The timestamp at the end of an event packet is guaranteed to be below or equal the timestamps at the end of any following packet within the same stream. See Clocks for more detail.
I made bt2 output some debug about the message types it gets when the error occurs and here is the output:
4-22 21:41:02.616 203722 203722 F LIB/MSG-ITER firstname.lastname@example.org:580 Got a message with timestamp 1575594328184404010 of type 4
04-22 21:41:02.616 203722 203722 F LIB/MSG-ITER email@example.com:580 Got a message with timestamp 1575594471692837471 of type 4
04-22 21:41:02.616 203722 203722 F LIB/MSG-ITER firstname.lastname@example.org:580 Got a message with timestamp 1575594527642472275 of type 16 <-- notice the packet ends much later than the last event in it
04-22 21:41:02.616 203722 203722 F LIB/MSG-ITER email@example.com:580 Got a message with timestamp 1575594527642470600 of type 8 <--
04-22 21:41:02.616 203722 203722 F LIB/MSG-ITER firstname.lastname@example.org:802 Babeltrace 2 library postcondition not satisfied; error is:
04-22 21:41:02.616 203722 203722 F LIB/MSG-ITER email@example.com:802 Clock snapshots are not monotonic
type 16 is packet end and type 8 is packet begin. The non-monotonicity does not involve events, only packet times.
Updated by Philippe Proulx almost 3 years ago
- Status changed from New to Invalid
Mathieu, Jérémie and I had a chat about all this.
Geneviève: can you create a new issue on this tracker because, although it is related somehow, it's a different bug. Once you create it, I will add details about a possible solution because we discussed this.
This issue here (Jim White's) does not describe a bug. If there's any bug, it's in the CTF specification document which does not mention that (although it was the initial intention according to Mathieu):
timestamp_beginfield of a packet context always has an absolute value: it resets the stream's clock to this value without applying any compression wrapping strategy, whatever the field's class's size.
- For a given stream, given two contiguous packets A and B, the
timestamp_begincontext field's value of A must be less than or equal to B's.
Therefore, we plan to clarify this in CTF and publish CTF 1.8.3, a revision.
In other words, the tracer which produced the broken trace is not CTF-compliant unfortunately. We don't plan to make Babeltrace 2 fix this invalid trace when decoding it. There might exist other options to postprocess the trace and fix it if you're interested.
Babeltrace 1 works because it basically doesn't care or doesn't expose packet context fields.
Closing this issue as invalid.