Bug #589: Lost events time ranges may be offset - Babeltrace - LTTng bugs repository

Actions

Copy link

Bug #589

closed

Lost events time ranges may be offset

Added by Matthew Khouzam almost 12 years ago. Updated about 11 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Mathieu Desnoyers

Category:

Target version:

Babeltrace 1.2

Start date:

07/09/2013

Due date:

% Done:

Estimated time:

Description

I am writing a new lost events scheme in eclipse and noticed that babeltrace will not display lost event ranges in the places where I think they should be.

The attached trace has lost events that should be

[warning] Tracer discarded 213 events between [17:39:14.716328157] and [17:39:.14716328600] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.

but instead we have
[warning] Tracer discarded 213 events between [17:39:14.715967946] and [17:39:14.715968233] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.

On a dev note, this looks like an easy off-by-one problem but I didn't look too closely at the code.

Files

Download all files

hello-ascending.tar.bz2 (156 KB) hello-ascending.tar.bz2		Matthew Khouzam, 07/09/2013 09:28 AM
0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch (1.83 KB) 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch		Jérémie Galarneau, 09/06/2013 01:05 PM
test-trace.tar.gz (5.85 KB) test-trace.tar.gz		Jérémie Galarneau, 09/06/2013 01:05 PM
patch-babeltrace-events-discarded.patch (14.9 KB) patch-babeltrace-events-discarded.patch		Mathieu Desnoyers, 11/06/2013 09:41 AM
fix-discarded-inaccuracy.patch (11.3 KB) fix-discarded-inaccuracy.patch		Mathieu Desnoyers, 02/25/2014 05:38 PM

Related issues 2 (0 open — 2 closed)

Related to Babeltrace - Bug #724: babeltrace of flight recorder trace undergoes time warp?

Resolved

Mathieu Desnoyers

01/23/2014

Actions

Related to Babeltrace - Bug #625: events dropped count misses events since 2.3

Resolved

09/03/2013

Actions

Issue # Delay: days Cancel

History
Notes
Property changes
Associated revisions

Actions

Copy link

Updated by Anonymous over 11 years ago

Subject changed from Lost events may or may not be displayed at the right times. (please confirm) to Lost events time ranges may be offset

I can confirm some discrepancy between Babeltrace and the Java TMF implementation. It seems the time ranges reported by Babeltrace are "offset" with regards to the lost events:

Using the trace at http://www.dorsal.polymtl.ca/~alexmont/data/hello-lost.tar.bz2 , the first few lost event ranges reported by TMF are (seconds are all "1376592664", I removed them for brevity):

[828,848,222 to 828,848,540], 859 lost events
[829,402,521 to 829,403,076], 488 lost events
[829,823,874 to 829,824,514], 884 lost events

and the first lost events reported by Babeltrace:

[829,402,521 to 829,403,076], 859 lost events
[829,823,874 to 829,824,514], 488 lost events
[830,394,108 to 830,394,755], 884 lost events

As you can see, the time ranges reported by Babeltrace are "offset" by one lost event range. I'm tempted to believe that the results reported by TMF are the correct ones, or it would have completely invented that first time range...

Actions

Copy link

Updated by Jérémie Galarneau over 11 years ago

Status changed from New to Feedback

The original bug report indicates that Babeltrace reports the lost events too early and the update says they are too late. Do both cases occur or is one of the reports wrong?

Actions

Copy link

Updated by Anonymous over 11 years ago

Ok I tested printing the first few lost events in the trace in the original report, here is what I got (TMF reports the start time + duration by default, but it's easy to see which is what):

TMF:

17:39:14.715 614 536 + 358, 213 lost events
17:39:14.715 967 946 + 287, 228 lost events
17:39:14.716 328 157 + 443, 232 lost events

Babeltrace:

17:39:14.715 967 946 - 17:39:14.715 968 233, 213 lost events
17:39:14.716 328 157 - 17:39:14.716 328 600, 228 lost events
17:39:14.716 698 005 - 17:39:14.716 698 299, 232 lost events

So I think Matthew just mixed up entries #1 and #3 in the first list, and the events in Babeltrace are really too late. It is indeed hard to follow! :)

Actions

Copy link Download all files

Updated by Jérémie Galarneau over 11 years ago

File 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch added
File test-trace.tar.gz test-trace.tar.gz added
Status changed from Feedback to In Progress

Here is a tentative fix.

To make it easier to diagnose the problem, I have created a trace which contains three packets, each announcing a single discarded event.

Packet 1: 1 event with clock cycle 0
Packet 2: 1000 events at clock cycles (1 to 1000)
Packet 3: 1 event with clock cycle 1001

Here is babeltrace's output with my proposed patch

$ ./converter/babeltrace --clock-cycles /tmp/ctfwriter_9ogyvU/ | grep "Tracer discarded" 
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000001] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events at end of stream between [00000000000000001001] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.

Can you add TMF's output so we can compare?

Actions

Copy link

Updated by Anonymous over 11 years ago

With that trace, TMF only actually sees one lost event at 19:00:00.001 234 567 with duration = 1. It's possible we don't support all types of lost events (per stream, per packet, at end of stream, etc.?) I've only tested it with lost events from LTTng. ;)

Actions

Copy link

Updated by Jérémie Galarneau over 11 years ago

I think Matthew is more familiar with TMF's CTF reader, perhaps he could comment on this issue?

Let me know if the trace is off-spec since; it was produced using the CTF Writer API, which is under review.

Actions

Copy link

Updated by Jérémie Galarneau over 11 years ago

Status changed from In Progress to Feedback

Actions

Copy link

Updated by Mathieu Desnoyers over 11 years ago

File patch-babeltrace-events-discarded.patch patch-babeltrace-events-discarded.patch added

With the patch I prepared (added to this bug), we get, on Jeremie's test case:

[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000001000] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.

Important comment added in the patch:

/* * Discarded events can be either: * - discarded after end of previous buffer due to buffer full: * happened within range: [ prev_timestamp_end, timestamp_begin ] * - discarded within current buffer due to either event too large or * nested wrap-around: * happened within range: [ timestamp_begin, timestamp_end ] * * Given we have discarded counters of those two types merged into the * events_discarded counter, we need to use the union of those ranges: * [ prev_timestamp_end, timestamp_end ]
*/

Please try it out and let me know what you think.

Thanks,

Mathieu

Actions

Copy link

Updated by Anonymous over 11 years ago

I tried Babeltrace with the patch, but the results are a bit weird...

With "hello-ascending" (which I tested previously in comment #3) I get:

17:39:14.715 614 894 - 19:00:00.000 000 000, 213 lost events
17:39:14.715 968 233 - 19:00:00.000 000 000, 228 lost events
17:39:14.716 328 600 - 19:00:00.000 000 000, 232 lost events

With "hello-lost" (from comment #1):

14:51:04.828 848 540 - 19:00:00.000 000 000, 859 lost events
14:51:04.829 403 076 - 19:00:00.000 000 000, 488 lost events
14:51:04.829 824 514 - 19:00:00.000 000 000, 884 lost events

It seems the start times Babeltrace reports are rather end times of the intervals compared to the previous results, and the end times are 0 (using --clock-seconds show 0.0000000)

Actions

Copy link

#10

Updated by Mathieu Desnoyers about 11 years ago

Target version set to Babeltrace 1.2

Actions

Copy link

#11

Updated by Mathieu Desnoyers about 11 years ago

Assignee set to Mathieu Desnoyers

Actions

Copy link

#12

Updated by Mathieu Desnoyers about 11 years ago

File fix-discarded-inaccuracy.patch fix-discarded-inaccuracy.patch added

Can you guys have a look at this patch and comment ?

The result with Jeremie's test case is:

The range is from the end of the previous buffer to the end of the current buffer (see comment in code).

Actions

Copy link

#13

Updated by Mathieu Desnoyers about 11 years ago

Status changed from Feedback to Resolved

Fixed by:

commit 2654fe9bab8f0eaeb17264ef7abadfd14e245b23
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Tue Feb 25 17:36:11 2014 -0500

    Fix: events discarded timing inaccuracy

    Introduce babeltrace_ctf_console_output too, to ensure we only print
    discarded events when a text output is used.

    Fixes #589

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

LTTng » Babeltrace

Bug #589

Lost events time ranges may be offset

Updated by Anonymous over 11 years ago

Updated by Jérémie Galarneau over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Jérémie Galarneau over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Jérémie Galarneau over 11 years ago

Updated by Jérémie Galarneau over 11 years ago

Updated by Mathieu Desnoyers over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Mathieu Desnoyers about 11 years ago

Updated by Mathieu Desnoyers about 11 years ago

Updated by Mathieu Desnoyers about 11 years ago

Updated by Mathieu Desnoyers about 11 years ago