Project

General

Profile

Bug #725

babeltrace of flight recorder trace complains about spurious event losses

Added by Daniel U. Thibault over 6 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Target version:
Start date:
01/23/2014
Due date:
% Done:

0%

Estimated time:

Description

Using babeltrace 1.1.1-31-513413e.

Assume a flight recorder trace is generated like this:

$ lttng create limkern4
Session limkern4 created.
Traces will be written in /home/daniel/lttng-traces/limkern4-20140123-091307
$ lttng enable-channel -k channel0 --tracefile-size 262144 --tracefile-count 4
Kernel channel channel0 enabled for session limkern4
$ lttng enable-event -k --all -c channel0
All kernel events are enabled in channel channel0
$ lttng start
Tracing started for session limkern4
(wait for a while)
$ lttng stop
Waiting for data availability.
Tracing stopped for session limkern4

Now when this is babeltraced:

$ sudo babeltrace /home/daniel/lttng-traces/limkern4-20140123-091307/ -n all -f all -v -w /home/daniel/lttng-traces/limkern4_nafav
[verbose] Verbose mode active.
[verbose] Converting from directory(ies):
[verbose]     /home/daniel/lttng-traces/limkern4-20140123-091307/
[verbose] Converting from format: ctf <default>
[verbose] Converting to target: /home/daniel/lttng-traces/limkern4_nafav
[verbose] Converting to format: text <default>
[verbose] CTF visitor: parent links creation... [verbose] done.
[verbose] CTF visitor: semantic check... [verbose] done.
[verbose] CTF visitor: metadata construction...
[verbose] env.hostname = "edge-vb-u12" 
[verbose] env.domain = "kernel" 
[verbose] env.sysname = "Linux" 
[verbose] env.release = "3.2.0-54-virtual" 
[verbose] env.version = "#82-Ubuntu SMP Tue Sep 10 20:31:18 UTC 2013" 
[verbose] env.tracer_name = "lttng-modules" 
[verbose] env.tracer_major = 2
[verbose] env.tracer_minor = 3
[verbose] env.tracer_patchlevel = 0
[verbose] done.
[warning] Tracer discarded 155182 events at end of stream between [09:13:51.255065834] and [09:13:51.255071071] in trace UUID a189442030db3049ab7ad7d0f36e7eb7, at path: "/home/daniel/lttng-traces/limkern4-20140123-091307/kernel", within stream id 0, at relative path: "channel0_0_2". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 155182 events at end of stream between [09:13:52.757480052] and [09:13:52.757493410] in trace UUID a189442030db3049ab7ad7d0f36e7eb7, at path: "/home/daniel/lttng-traces/limkern4-20140123-091307/kernel", within stream id 0, at relative path: "channel0_0_3". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 155182 events at end of stream between [09:13:52.836155212] and [09:13:52.836158254] in trace UUID a189442030db3049ab7ad7d0f36e7eb7, at path: "/home/daniel/lttng-traces/limkern4-20140123-091307/kernel", within stream id 0, at relative path: "channel0_0_0". You should consider recording a new trace with larger buffers or with fewer events enabled.
[warning] Tracer discarded 155182 events at end of stream between [09:13:52.862220826] and [09:13:52.862223671] in trace UUID a189442030db3049ab7ad7d0f36e7eb7, at path: "/home/daniel/lttng-traces/limkern4-20140123-091307/kernel", within stream id 0, at relative path: "channel0_0_1". You should consider recording a new trace with larger buffers or with fewer events enabled.
[verbose] finished converting. Output written to:
/home/daniel/lttng-traces/limkern4_nafav

Note how each of the four trace files generated a nearly identical warning. Inspection of the babeltrace output shows no gaps between the records of one file and the next (based on the stream.packet.context:timestamp_begin and _end values). A gap would show as a large delta value when a changing stream.packet.context:timestamp_begin value signals a change of trace file chunk. The largest delta seen is +0.024946343 and all of the larger delta@s are right after a @power_start event.

Could it be the stream.packet.context:events_discarded fills up when a buffer gets overwritten and stays "maxed out" somehow? If that is the case, this count of events lost before the flight recorder begins (if that is what the value represents) is kind of pointless. A user only cares about events lost within the trace's time span. Events lost before the trace begins should only be reported once, not with every trace file chunk.

#1

Updated by Mathieu Desnoyers about 6 years ago

We should document this limitation in lttng-tools in the section about tracefile rotation. This is not within the CTF specification per se.

#2

Updated by Mathieu Desnoyers about 6 years ago

  • Project changed from Babeltrace to LTTng-tools
  • Target version changed from Babeltrace 1.1 to 2.5
#3

Updated by David Goulet about 6 years ago

  • Status changed from New to Confirmed
  • Target version deleted (2.5)

Some IRC logs that explains why this can't be in 2.5.

11:44 < jgalar> dgoulet: we'd have to change the CTF spec to accomodate this... :/
11:45 < Compudj> there is currently no way to tie various packets from different files together as if they were 
                 from the same stream
11:45 < Compudj> the proper way to do this would be to add a stream instance ID
11:45 < Compudj> so each stream within a stream class would have its own ID
11:46 < Compudj> and therefore we could map back packets from various files to their stream
#4

Updated by Jérémie Galarneau over 4 years ago

  • Status changed from Confirmed to In Progress
  • Assignee set to Julien Desfossez
  • Target version set to 2.8
#5

Updated by Jérémie Galarneau about 4 years ago

  • Status changed from In Progress to Resolved

This is fixed as of LTTng 2.8 since the tracer now produces a packet sequence count which allows viewers to indicate where/if a packet was dropped.

Babeltrace 1.4 will be the first version to indicate packet losses. The "lttng stop" command also includes such a notification.

Also available in: Atom PDF