Project

General

Profile

Actions

Bug #589

closed

Lost events time ranges may be offset

Added by Matthew Khouzam almost 11 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
Start date:
07/09/2013
Due date:
% Done:

0%

Estimated time:

Description

I am writing a new lost events scheme in eclipse and noticed that babeltrace will not display lost event ranges in the places where I think they should be.

The attached trace has lost events that should be

[warning] Tracer discarded 213 events between [17:39:14.716328157] and [17:39:.14716328600] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.

but instead we have
[warning] Tracer discarded 213 events between [17:39:14.715967946] and [17:39:14.715968233] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.

On a dev note, this looks like an easy off-by-one problem but I didn't look too closely at the code.


Files

hello-ascending.tar.bz2 (156 KB) hello-ascending.tar.bz2 Matthew Khouzam, 07/09/2013 09:28 AM
0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch (1.83 KB) 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch Jérémie Galarneau, 09/06/2013 01:05 PM
test-trace.tar.gz (5.85 KB) test-trace.tar.gz Jérémie Galarneau, 09/06/2013 01:05 PM
patch-babeltrace-events-discarded.patch (14.9 KB) patch-babeltrace-events-discarded.patch Mathieu Desnoyers, 11/06/2013 09:41 AM
fix-discarded-inaccuracy.patch (11.3 KB) fix-discarded-inaccuracy.patch Mathieu Desnoyers, 02/25/2014 05:38 PM

Related issues 2 (0 open2 closed)

Related to Babeltrace - Bug #724: babeltrace of flight recorder trace undergoes time warp?ResolvedMathieu Desnoyers01/23/2014

Actions
Related to Babeltrace - Bug #625: events dropped count misses events since 2.3Resolved09/03/2013

Actions
Actions #1

Updated by Anonymous over 10 years ago

  • Subject changed from Lost events may or may not be displayed at the right times. (please confirm) to Lost events time ranges may be offset

I can confirm some discrepancy between Babeltrace and the Java TMF implementation. It seems the time ranges reported by Babeltrace are "offset" with regards to the lost events:

Using the trace at http://www.dorsal.polymtl.ca/~alexmont/data/hello-lost.tar.bz2 , the first few lost event ranges reported by TMF are (seconds are all "1376592664", I removed them for brevity):

[828,848,222 to 828,848,540], 859 lost events
[829,402,521 to 829,403,076], 488 lost events
[829,823,874 to 829,824,514], 884 lost events

and the first lost events reported by Babeltrace:

[829,402,521 to 829,403,076], 859 lost events
[829,823,874 to 829,824,514], 488 lost events
[830,394,108 to 830,394,755], 884 lost events

As you can see, the time ranges reported by Babeltrace are "offset" by one lost event range. I'm tempted to believe that the results reported by TMF are the correct ones, or it would have completely invented that first time range...

Actions #2

Updated by Jérémie Galarneau over 10 years ago

  • Status changed from New to Feedback

The original bug report indicates that Babeltrace reports the lost events too early and the update says they are too late. Do both cases occur or is one of the reports wrong?

Actions #3

Updated by Anonymous over 10 years ago

Ok I tested printing the first few lost events in the trace in the original report, here is what I got (TMF reports the start time + duration by default, but it's easy to see which is what):

TMF:

17:39:14.715 614 536 + 358, 213 lost events
17:39:14.715 967 946 + 287, 228 lost events
17:39:14.716 328 157 + 443, 232 lost events

Babeltrace:

17:39:14.715 967 946 - 17:39:14.715 968 233, 213 lost events
17:39:14.716 328 157 - 17:39:14.716 328 600, 228 lost events
17:39:14.716 698 005 - 17:39:14.716 698 299, 232 lost events

So I think Matthew just mixed up entries #1 and #3 in the first list, and the events in Babeltrace are really too late. It is indeed hard to follow! :)

Updated by Jérémie Galarneau over 10 years ago

Here is a tentative fix.

To make it easier to diagnose the problem, I have created a trace which contains three packets, each announcing a single discarded event.

Packet 1: 1 event with clock cycle 0
Packet 2: 1000 events at clock cycles (1 to 1000)
Packet 3: 1 event with clock cycle 1001

Here is babeltrace's output with my proposed patch

$ ./converter/babeltrace --clock-cycles /tmp/ctfwriter_9ogyvU/ | grep "Tracer discarded" 
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000001] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events at end of stream between [00000000000000001001] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.

Can you add TMF's output so we can compare?

Actions #5

Updated by Anonymous over 10 years ago

With that trace, TMF only actually sees one lost event at 19:00:00.001 234 567 with duration = 1. It's possible we don't support all types of lost events (per stream, per packet, at end of stream, etc.?) I've only tested it with lost events from LTTng. ;)

Actions #6

Updated by Jérémie Galarneau over 10 years ago

I think Matthew is more familiar with TMF's CTF reader, perhaps he could comment on this issue?

Let me know if the trace is off-spec since; it was produced using the CTF Writer API, which is under review.

Actions #7

Updated by Jérémie Galarneau over 10 years ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by Mathieu Desnoyers over 10 years ago

With the patch I prepared (added to this bug), we get, on Jeremie's test case:

[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000001000] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.

Important comment added in the patch:

/* * Discarded events can be either: * - discarded after end of previous buffer due to buffer full: * happened within range: [ prev_timestamp_end, timestamp_begin ] * - discarded within current buffer due to either event too large or * nested wrap-around: * happened within range: [ timestamp_begin, timestamp_end ] * * Given we have discarded counters of those two types merged into the * events_discarded counter, we need to use the union of those ranges: * [ prev_timestamp_end, timestamp_end ]
*/

Please try it out and let me know what you think.

Thanks,

Mathieu

Actions #9

Updated by Anonymous over 10 years ago

I tried Babeltrace with the patch, but the results are a bit weird...

With "hello-ascending" (which I tested previously in comment #3) I get:

17:39:14.715 614 894 - 19:00:00.000 000 000, 213 lost events
17:39:14.715 968 233 - 19:00:00.000 000 000, 228 lost events
17:39:14.716 328 600 - 19:00:00.000 000 000, 232 lost events

With "hello-lost" (from comment #1):

14:51:04.828 848 540 - 19:00:00.000 000 000, 859 lost events
14:51:04.829 403 076 - 19:00:00.000 000 000, 488 lost events
14:51:04.829 824 514 - 19:00:00.000 000 000, 884 lost events

It seems the start times Babeltrace reports are rather end times of the intervals compared to the previous results, and the end times are 0 (using --clock-seconds show 0.0000000)

Actions #10

Updated by Mathieu Desnoyers about 10 years ago

  • Target version set to Babeltrace 1.2
Actions #11

Updated by Mathieu Desnoyers about 10 years ago

  • Assignee set to Mathieu Desnoyers
Actions #12

Updated by Mathieu Desnoyers about 10 years ago

Can you guys have a look at this patch and comment ?

The result with Jeremie's test case is:

[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000001000] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.

The range is from the end of the previous buffer to the end of the current buffer (see comment in code).

Actions #13

Updated by Mathieu Desnoyers about 10 years ago

  • Status changed from Feedback to Resolved

Fixed by:

commit 2654fe9bab8f0eaeb17264ef7abadfd14e245b23
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Tue Feb 25 17:36:11 2014 -0500

    Fix: events discarded timing inaccuracy

    Introduce babeltrace_ctf_console_output too, to ensure we only print
    discarded events when a text output is used.

    Fixes #589

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Actions

Also available in: Atom PDF