Bug #589
closedLost events time ranges may be offset
Added by Matthew Khouzam over 11 years ago. Updated over 10 years ago.
0%
Description
I am writing a new lost events scheme in eclipse and noticed that babeltrace will not display lost event ranges in the places where I think they should be.
The attached trace has lost events that should be
[warning] Tracer discarded 213 events between [17:39:14.716328157] and [17:39:.14716328600] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.
but instead we have
[warning] Tracer discarded 213 events between [17:39:14.715967946] and [17:39:14.715968233] in trace UUID 884017b522d74fbcac37cf0078daf8, at path: ".", within stream id 0, at relative path: "channel0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.
On a dev note, this looks like an easy off-by-one problem but I didn't look too closely at the code.
Files
hello-ascending.tar.bz2 (156 KB) hello-ascending.tar.bz2 | Matthew Khouzam, 07/09/2013 09:28 AM | ||
0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch (1.83 KB) 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch | Jérémie Galarneau, 09/06/2013 01:05 PM | ||
test-trace.tar.gz (5.85 KB) test-trace.tar.gz | Jérémie Galarneau, 09/06/2013 01:05 PM | ||
patch-babeltrace-events-discarded.patch (14.9 KB) patch-babeltrace-events-discarded.patch | Mathieu Desnoyers, 11/06/2013 09:41 AM | ||
fix-discarded-inaccuracy.patch (11.3 KB) fix-discarded-inaccuracy.patch | Mathieu Desnoyers, 02/25/2014 05:38 PM |
Updated by Anonymous about 11 years ago
- Subject changed from Lost events may or may not be displayed at the right times. (please confirm) to Lost events time ranges may be offset
I can confirm some discrepancy between Babeltrace and the Java TMF implementation. It seems the time ranges reported by Babeltrace are "offset" with regards to the lost events:
Using the trace at http://www.dorsal.polymtl.ca/~alexmont/data/hello-lost.tar.bz2 , the first few lost event ranges reported by TMF are (seconds are all "1376592664", I removed them for brevity):
[828,848,222 to 828,848,540], 859 lost events [829,402,521 to 829,403,076], 488 lost events [829,823,874 to 829,824,514], 884 lost events
and the first lost events reported by Babeltrace:
[829,402,521 to 829,403,076], 859 lost events [829,823,874 to 829,824,514], 488 lost events [830,394,108 to 830,394,755], 884 lost events
As you can see, the time ranges reported by Babeltrace are "offset" by one lost event range. I'm tempted to believe that the results reported by TMF are the correct ones, or it would have completely invented that first time range...
Updated by Jérémie Galarneau about 11 years ago
- Status changed from New to Feedback
The original bug report indicates that Babeltrace reports the lost events too early and the update says they are too late. Do both cases occur or is one of the reports wrong?
Updated by Anonymous about 11 years ago
Ok I tested printing the first few lost events in the trace in the original report, here is what I got (TMF reports the start time + duration by default, but it's easy to see which is what):
TMF:
17:39:14.715 614 536 + 358, 213 lost events 17:39:14.715 967 946 + 287, 228 lost events 17:39:14.716 328 157 + 443, 232 lost events
Babeltrace:
17:39:14.715 967 946 - 17:39:14.715 968 233, 213 lost events 17:39:14.716 328 157 - 17:39:14.716 328 600, 228 lost events 17:39:14.716 698 005 - 17:39:14.716 698 299, 232 lost events
So I think Matthew just mixed up entries #1 and #3 in the first list, and the events in Babeltrace are really too late. It is indeed hard to follow! :)
Updated by Jérémie Galarneau about 11 years ago
- File 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch 0001-Fix-Previous-packet-timestamp-should-not-be-overwrit.patch added
- File test-trace.tar.gz test-trace.tar.gz added
- Status changed from Feedback to In Progress
Here is a tentative fix.
To make it easier to diagnose the problem, I have created a trace which contains three packets, each announcing a single discarded event.
Packet 1: 1 event with clock cycle 0
Packet 2: 1000 events at clock cycles (1 to 1000)
Packet 3: 1 event with clock cycle 1001
Here is babeltrace's output with my proposed patch
$ ./converter/babeltrace --clock-cycles /tmp/ctfwriter_9ogyvU/ | grep "Tracer discarded" [warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled. [warning] Tracer discarded 1 events between [00000000000000000001] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled. [warning] Tracer discarded 1 events at end of stream between [00000000000000001001] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: "/tmp/ctfwriter_9ogyvU", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
Can you add TMF's output so we can compare?
Updated by Anonymous about 11 years ago
With that trace, TMF only actually sees one lost event at 19:00:00.001 234 567 with duration = 1. It's possible we don't support all types of lost events (per stream, per packet, at end of stream, etc.?) I've only tested it with lost events from LTTng. ;)
Updated by Jérémie Galarneau about 11 years ago
I think Matthew is more familiar with TMF's CTF reader, perhaps he could comment on this issue?
Let me know if the trace is off-spec since; it was produced using the CTF Writer API, which is under review.
Updated by Jérémie Galarneau about 11 years ago
- Status changed from In Progress to Feedback
Updated by Mathieu Desnoyers about 11 years ago
With the patch I prepared (added to this bug), we get, on Jeremie's test case:
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000001000] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
Important comment added in the patch:
/*
* Discarded events can be either:
* - discarded after end of previous buffer due to buffer full:
* happened within range: [ prev_timestamp_end, timestamp_begin ]
* - discarded within current buffer due to either event too large or
* nested wrap-around:
* happened within range: [ timestamp_begin, timestamp_end ]
*
* Given we have discarded counters of those two types merged into the
* events_discarded counter, we need to use the union of those ranges:
* [ prev_timestamp_end, timestamp_end ]
*/
Please try it out and let me know what you think.
Thanks,
Mathieu
Updated by Anonymous about 11 years ago
I tried Babeltrace with the patch, but the results are a bit weird...
With "hello-ascending" (which I tested previously in comment #3) I get:
17:39:14.715 614 894 - 19:00:00.000 000 000, 213 lost events 17:39:14.715 968 233 - 19:00:00.000 000 000, 228 lost events 17:39:14.716 328 600 - 19:00:00.000 000 000, 232 lost events
With "hello-lost" (from comment #1):
14:51:04.828 848 540 - 19:00:00.000 000 000, 859 lost events 14:51:04.829 403 076 - 19:00:00.000 000 000, 488 lost events 14:51:04.829 824 514 - 19:00:00.000 000 000, 884 lost events
It seems the start times Babeltrace reports are rather end times of the intervals compared to the previous results, and the end times are 0 (using --clock-seconds show 0.0000000)
Updated by Mathieu Desnoyers over 10 years ago
- Target version set to Babeltrace 1.2
Updated by Mathieu Desnoyers over 10 years ago
- Assignee set to Mathieu Desnoyers
Updated by Mathieu Desnoyers over 10 years ago
Can you guys have a look at this patch and comment ?
The result with Jeremie's test case is:
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000000000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000000000] and [00000000000000001000] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 1 events between [00000000000000001000] and [00000000000000001001] in trace UUID 50fe6b07cf54c1e94bb58e8137b37b, at path: ".", within stream id 0, at relative path: "test_stream_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
The range is from the end of the previous buffer to the end of the current buffer (see comment in code).
Updated by Mathieu Desnoyers over 10 years ago
- Status changed from Feedback to Resolved
Fixed by:
commit 2654fe9bab8f0eaeb17264ef7abadfd14e245b23 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Tue Feb 25 17:36:11 2014 -0500 Fix: events discarded timing inaccuracy Introduce babeltrace_ctf_console_output too, to ensure we only print discarded events when a text output is used. Fixes #589 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>