LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912023-04-26T17:57:32ZLTTng bugs repository
Redmine Babeltrace - Bug #1374 (New): Babeltrace fails to read trace with no data stream type ID when the...https://bugs.lttng.org/issues/13742023-04-26T17:57:32ZErica Bugden
<p>Babeltrace is not able to read a trace that does not contain data stream IDs (empty packet header) when the metadata specifies an ID for the stream. Trace Compass is also unable to read this trace. If the lines mentioning the stream ID are removed from the metadata, then babeltrace is able to read the trace.</p>
<a name="Does-this-need-to-be-fixed"></a>
<h3 >Does this need to be fixed?<a href="#Does-this-need-to-be-fixed" class="wiki-anchor">¶</a></h3>
<p>This was first reported as a barectf issue <a class="external" href="https://github.com/efficios/barectf/issues/28">https://github.com/efficios/barectf/issues/28</a> , but after further consideration the issue appears to be in a grey area in the CTF 1.8 specification. Whether it is the CTF generator's (barectf) responsibility or the CTF reader's responsibility is up for debate.</p>
<p>The superfluous metadata information need not prevent the trace from being parsed. For example, yactfr <a class="external" href="https://github.com/eepp/yactfr">https://github.com/eepp/yactfr</a> is able to read the trace. However, considering that:</p>
<ul>
<li>At least two CTF readers (babeltrace and Trace Compass) cannot read the trace, and</li>
<li>It is easy to fix in barectf (Proposed fix: <a class="external" href="https://review.lttng.org/c/barectf/+/9868">https://review.lttng.org/c/barectf/+/9868</a> )</li>
</ul>
<p>this is not a high priority issue in babeltrace.</p>
<a name="Context"></a>
<h1 >Context<a href="#Context" class="wiki-anchor">¶</a></h1>
<p>- Background and reproducer: barectf github issue <a class="external" href="https://github.com/efficios/barectf/issues/28">https://github.com/efficios/barectf/issues/28</a> <br />- Background and specification discussion: barectf fix <a class="external" href="https://review.lttng.org/c/barectf/+/9868">https://review.lttng.org/c/barectf/+/9868</a></p> LTTng - Feature #1347 (New): Codename for 2.14https://bugs.lttng.org/issues/13472022-03-01T20:27:38ZMichael Jeansonmjeanson@efficios.com
<p>What should the codename of the LTTng 2.14 release be?</p>
<p>Must be a micro-brewed Quebec beer and start with a "O".</p> Babeltrace - Bug #1234 (Feedback): src.text.dmesg: some kernel ring buffer lines can be wrongly s...https://bugs.lttng.org/issues/12342020-02-17T21:54:00ZPhilippe Proulxeeppeliteloop@gmail.com
<p>The lines of the <code>dmesg</code> command start with a time. The lines are supposed to be in order of time, but some of them can be at the wrong place.</p>
<p><code>flt.utils.muxer</code> does not like this and complains that event messages are not sorted by their default clock snapshot value.</p>
<p>It is, in fact, a <code>src.text.dmesg</code> bug because a message iterator must emit messages in order of time.</p>
<p>If the input is a file, one solution would be to sort the lines first (if not too large), and then emit the messages in this order.</p>
<p>We could also, in all scenarios, skip the lines with a time that is before the last event message's time and warn accordingly.</p> Babeltrace - Bug #1223 (Feedback): Check for cmp and diff in configurehttps://bugs.lttng.org/issues/12232020-02-17T21:03:26ZSimon Marchisimon.marchi@polymtl.ca
<p>Seen somewhere in the configure output of babeltrace on a fresh centos 8 container:</p>
<pre>
checking for a working dd... ./configure: line 9826: cmp: command not found
checking if gcc supports -fno-rtti -fno-exceptions... ./configure: line 11790: diff: command not found
</pre>
<p>The configure keeps going, I think the consequence may simply be that the results of these checks are inaccurate.</p>
<p>For correctness, if these tools (cmd and diff) are used, there should be something that checks that they exist before using them.</p> LTTng-tools - Bug #1159 (Feedback): Missing documentation: before Linux 4.17, suspending the syst...https://bugs.lttng.org/issues/11592018-04-20T19:22:26ZPhilippe Proulxeeppeliteloop@gmail.com
<p>From LWN:</p>
<blockquote>
<p>The CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks used to differ only in that the latter is fast-forwarded after a suspend-and-resume cycle. As of 4.17, CLOCK_MONOTONIC is also moved forward to reflect the time that the system spent suspended. As a result, the two timers are now identical and have been unified within the kernel. Among other things, that change eliminates a potentially surprising behavior wherein the offset between the monotonic and realtime clocks would change after a resume. Thomas Gleixner noted: "There might be side effects in applications, which rely on the (unfortunately) well documented behaviour of the MONOTONIC clock, but the downsides of the existing behaviour are probably worse.</p>
</blockquote>
<p>We should document that LTTng can produce incorrect traces when the target system is suspended while tracing is active if the system runs Linux 4.16 or less.</p>
<p>Mathieu Desnoyers suggested that we put this in <code>lttng-regenerate(1)</code>'s man page. Is this the appropriate location? Should the warning exist in all command man pages (as a common limitation at the end of the page)? <code>lttng-create(1)</code>'s man page could also be an appropriate location.</p> LTTng-tools - Bug #1124 (Feedback): lttng-sessiond saves empty domain stubshttps://bugs.lttng.org/issues/11242017-07-24T22:35:16ZJérémie Galarneaujeremie.galarneau@efficios.com
<p>The save command, when used to save a session containing a single log4j event and channel, produces empty stubs for all userspace domains.</p>
<pre>
<?xml version="1.0" encoding="UTF-8"?>
<sessions>
<session>
<name>test</name>
<domains>
<domain>
<type>UST</type>
<buffer_type>PER_UID</buffer_type>
<channels/>
<trackers/>
</domain>
<domain>
<type>JUL</type>
<buffer_type>PER_UID</buffer_type>
<channels/>
</domain>
<domain>
<type>LOG4J</type>
<buffer_type>PER_UID</buffer_type>
<channels>
<channel>
<name>lttng_log4j_channel</name>
<enabled>true</enabled>
<overwrite_mode>DISCARD</overwrite_mode>
<subbuffer_size>524288</subbuffer_size>
<subbuffer_count>4</subbuffer_count>
<switch_timer_interval>0</switch_timer_interval>
<read_timer_interval>0</read_timer_interval>
<output_type>MMAP</output_type>
<tracefile_size>0</tracefile_size>
<tracefile_count>0</tracefile_count>
<live_timer_interval>0</live_timer_interval>
<events>
<event>
<name>dfklasj</name>
<enabled>true</enabled>
<type>TRACEPOINT</type>
<loglevel_type>ALL</loglevel_type>
<filter>logger_name == &quot;dfklasj&quot;</filter>
</event>
</events>
<contexts/>
</channel>
</channels>
</domain>
<domain>
<type>PYTHON</type>
<buffer_type>PER_UID</buffer_type>
<channels/>
</domain>
</domains>
<started>false</started>
<output>
<consumer_output>
<enabled>true</enabled>
<destination>
<path>/home/jgalar/lttng-traces/test-20170724-142342</path>
</destination>
</consumer_output>
</output>
</session>
</sessions>
</pre>
<p>Reproduction steps:</p>
<pre>
$ lttng create test
Session test created.
Traces will be written in /home/jgalar/lttng-traces/test-20170724-142342
$ lttng enable-channel -u lttng_log4j_channel --monitor-timer 456789
UST channel lttng_log4j_channel enabled for session test
$ lttng enable-event -l dfklasj
LOG4J event dfklasj enabled
$ lttng list test
Tracing session test: [inactive]
Trace path: /home/jgalar/lttng-traces/test-20170724-142342
=== Domain: UST global ===
Buffer type: per UID
Channels:
-------------
- lttng_log4j_channel: [enabled]
Attributes:
overwrite mode: 0
subbuffers size: 524288
number of subbuffers: 4
switch timer interval: 0
read timer interval: 0
monitor timer interval: 456789
trace file count: 0
trace file size (bytes): 0
discarded events: 0
lost packets: 0
output: mmap()
Events:
None
=== Domain: LOG4j (Logging for Java) ===
Events (Logger name):
---------------------
- dfklasj [enabled] [filter: 'logger_name == "dfklasj"']
$ lttng save
All sessions have been saved successfully.
</pre> LTTng - Bug #1060 (Confirmed): Document the extra reading subbuffer always allocatedhttps://bugs.lttng.org/issues/10602016-08-30T14:04:00ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>On channel creation a user might only consider the size of the passed argument instead we should either warn or document that there is always an extra sub buffer allocated for reading purpose.</p>
<p>See lib_ring_buffer_backend_allocate on lttng-modules.</p>
<p>This also apply to userspace.</p> LTTng - Feature #990 (Confirmed): Additional make check targetshttps://bugs.lttng.org/issues/9902016-01-19T20:23:31ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>We might want to add more make check targets to ease testing both manually and via ci.</p>
<p>E.g.<br />make check_verbose<br />make check_archive<br />make check_root<br />etc.</p> LTTng-tools - Bug #822 (Confirmed): bash-completion sometimes completes too muchhttps://bugs.lttng.org/issues/8222014-07-28T18:04:17ZSimon Marchisimon.marchi@polymtl.ca
<p>I am not sure how to word it, but here is an example. I have a single session, named "auto-20140728-134322".</p>
<p>$ lttng des<tab> => $ lttng destroy</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy <tab> => $ lttng destroy auto-20140728-134322</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy auto-20140728-134322 <tab> => $ lttng destroy auto-20140728-134322 auto-20140728-134322</p>
<p>Oops, we can go like that for a long time. Every <tab> press adds a "auto-20140728-134322". It should detect that we already gave a positional argument. Since the command takes only one positional argument, subsequent <tab> presses should do nothing.</p> LTTng-tools - Bug #752 (In Progress): Output paths need better handling than truncationhttps://bugs.lttng.org/issues/7522014-03-07T21:33:50ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
For instance, in the aftermath of <code>lttng-tools/src/bin/lttng-sessiond/cmd.c:record_ust_snapshot</code>, the <code>msg.u.snapshot_channel.pathname</code> is limited to PATH_MAX (typically 4096) but is built with <code>"%s/%s-%s-%" PRIu64 "%s"</code>, where the successive arguments are (discounting the closing nulls):
<ul>
<li><code>output->consumer->dst.trace_path</code> PATH_MAX - 1 (no trailing /)</li>
<li>/ 1</li>
<li><code>output->name</code> NAME_MAX - 1</li>
<li>- 1</li>
<li><code>output->datetime</code> 15</li>
<li>- 1</li>
<li><code>output->nb_snapshot</code> 20 digits (unsigned 64-bit integer)</li>
<li><code>session_path</code> PATH_MAX - 1 (including leading and trailing /)</li>
</ul>
<p>The worst-case <code>session_path</code> part is <code>'/ust/pid/<proc>-<vpid>-<datetime>/'</code> so it's actually limited to 12+15+5+15 = 47 characters (<code>/proc/PID/status.name</code> is truncated to 15 characters, and VPID is unsigned 16-bit for 5 characters) (closing null excluded). So one solution would be to limit the <code>consumer->dst.trace_path</code> to PATH_MAX - (NAME_MAX - 1 + 15 + 20 + 47 + 3) - 1 (for the null). However, if we want the path+filetitles of the channel files to fit in PATH_MAX, we need to chop another NAME_MAX off (and also limit channel names to NAME_MAX - (1 + 5 + 1 + 10 + 1) [underscore, 16-bit unsigned CPU ID, underscore, 32-bit unsigned chunk number, null] so they fit).</p>
<p>Truncation remains nevertheless possible, and would wreak havoc with the trace output tree. <code>Babeltrace</code> and the user count on proper folder and file tree structure to manage their traces. The code needs to detect instances of truncation and report them as errors.</p>
<p>As an aside, the snapshot output name should be limited to MAX_PATH - (1+10) because it gets suffixed with a hyphen and an unsigned 32-bit integer (the output set sequential ID).</p> LTTng-tools - Bug #720 (Confirmed): Disambiguating fully qualified event names, timestamp-sensiti...https://bugs.lttng.org/issues/7202014-01-16T14:11:33ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>In user-space, two applications may be using different tracepoint providers that register their events under the same provider name and event name (I guess this could also happen in kernel space if two modules are loaded with user-designed providers that similarly clashing events). Granted, not an usual occurrence but it could happen, for instance if a long-running trace sees an application undergoing an upgrade between executions.</p>
<p>The problem is that unless the buffering scheme is per-processID, the metadata will only capture the first event registration. The trace will be captured correctly, but <code>babeltrace</code> won't be able to correctly decode the second application's events. This can either lead to <code>babeltrace</code> quitting (typically with "<code>[error] Event id nn is outside range</code>") or generating incorrectly labelled event payloads, even spurious additional events.</p>
<p>This is not an easy bug to solve. At first glance it would seem the metadata should be enriched with timestamped event definitions, but by itself this won't be enough because the trace would still need to have some way to correctly disambiguate same-name events as they come in.</p>
<p>The bug should not be dismissed as "user error" because it could be used as part of an attack to disable system monitoring (consider what would happen if two same-name-but-different-payloads events were fed to a live trace analyzer).</p> LTTng-tools - Bug #653 (Confirmed): LTTng memory allocation failure goes unreportedhttps://bugs.lttng.org/issues/6532013-10-21T13:32:43ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Consider the following (LTTng 2.3.0 running in a single-processor 1 GiB virtual machine):</p>
<pre>
$ lttng create test
Session test created.
Traces will be written in /home/daniel/lttng-traces/test-20131018-122002
$ lttng enable-channel ch -u --discard --num-subbuf 0x800
UST channel ch enabled for session test
$ lttng enable-event -c ch -u -a
All UST events are enabled in channel ch
$ lttng start
Tracing started for session test
$ lttng list test
Tracing session test: [active]
Trace path: /home/daniel/lttng-traces/test-20131018-122002
=== Domain: UST global ===
Buffer type: per UID
Channels:
-------------
- ch: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 131072
number of subbufers: 2048
switch timer interval: 0
read timer interval: 0
output: mmap()
Events:
* (type: tracepoint) [enabled]
$ lttng destroy
Session test destroyed
$ lttng create test
Session test created.
Traces will be written in /home/daniel/lttng-traces/test-20131018-122241
$ lttng enable-channel ch -u --discard --num-subbuf 0x1000
UST channel ch enabled for session test
$ lttng enable-event -c ch -u -a
All UST events are enabled in channel ch
$ lttng start
Tracing started for session test
$ lttng list test
Tracing session test: [active]
Trace path: /home/daniel/lttng-traces/test-20131018-122241
=== Domain: UST global ===
Buffer type: per UID
Channels:
-------------
- ch: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 131072
number of subbufers: 4096
switch timer interval: 0
read timer interval: 0
output: mmap()
Events:
* (type: tracepoint) [enabled]
$ lttng destroy
Session test destroyed
</pre>
<p>The first session generates a trace as expected. The second does not: the trace folder remains stubbornly empty.</p>
<p>This is puzzling from a memory management point of view (the first session grabs 256 MiB of buffers, the second should grab 512 MiB), since the system's capacity was apparently not reached.</p>
<p>The one error captured in the session log (attached) is:<br /><pre>
libringbuffer[14690/14695]: Error: zero_file: No space left on device (in _shm_object_table_alloc_shm() at shm.c:173)
</pre></p>
<p>The error needs to be properly captured and passed on to the lttng client.</p> LTTng-tools - Bug #592 (Confirmed): Potential trace process subdirectory name collision with PID ...https://bugs.lttng.org/issues/5922013-07-10T16:14:42ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Per-PID trace subdirectories are named according to the “name-vpid-timestamp” scheme, where "name" is a process name (truncated to 15 characters by the system), "vpid" is a virtual process ID (process ID within a PID namespace), and "timestamp" has a one-second resolution and matches the time when the first events are recorded for that particular process.</p>
<p>It is possible, within a one-second window (same timestamp), to spawn two copies of a given process (same name) into two different PID namespaces (allowing the same VPIDs). There could thus be a collision in the trace output directory structure, since the “name-vpid-timestamp” process subdirectory names could match.</p>
<p>This bug is very similar to <a class="issue tracker-1 status-7 priority-3 priority-lowest" title="Bug: Under certain conditions, a user-space trace may overwrite itself (Confirmed)" href="https://bugs.lttng.org/issues/561">#561</a>.</p> LTTng-tools - Bug #561 (Confirmed): Under certain conditions, a user-space trace may overwrite it...https://bugs.lttng.org/issues/5612013-06-13T18:14:30ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Suppose we do this:</p>
<pre>
$ sudo -H lttng create asession
$ sudo -H lttng enable-event -a -u
$ sudo -H lttng start
</pre>
<p>And suppose we have an application that has been instrumented with some user-space tracepoint provider. Suppose the application's main loop is something like this (borrowed from easy-ust):</p>
<pre>
int main(int argc, char **argv)
{
int i = 0;
char themessage[20]; //Can hold up to "Hello World 9999999\0"
void *libtp_handle;
libtp_handle = dlopen("./libtp.so", RTLD_LAZY);
fprintf(stderr, "sample starting\n");
for (i = 0; i < 10000; i++) {
if ((i == 3333) && (libtp_handle)) dlclose(libtp_handle);
if (i == 6666) libtp_handle = dlopen("./libtp.so", RTLD_LAZY);
sprintf(themessage, "Hello World %u", i);
tracepoint(sample_component, event, themessage);
usleep(1);
}
fprintf(stderr, "sample done\n");
if (libtp_handle) return dlclose(libtp_handle);
return 0;
}
</pre>
<p>The trace produced will capture two separate processes: the first one for the app's first 3333 loops, the second for the app's last 3333 loops. This is because the app will register itself as a user-space event source, then withdraw its registration only to later re-register.</p>
<p>As it happens, most of the time the two third-runs will be a second apart, resulting in the trace holding two pid subdirectories: say <code>sample-17541-20130613-135148</code> and <code>sample-17541-20130613-135149</code>. But now and again both processes will be within the same one-second window, and the trace will thus contain only one pid subdirectory, say <code>sample-17541-20130613-135151</code> ---the problem is that the app's first 3333 loops were written to disk and then the last 3333 loops were written to the same file.</p>
<p>This only gets worse if the dlopen/dlclose calls are more tightly packed in time.</p>
<p>The bug boils down to this: once a tracing session detects a new process client, lttng should detect path collisions and correct for them. One solution would be to have a trace's path be:</p>
<pre>
tracepath/ust/pid/process_name-VPID-yyyymmdd-hhmmss[-n]/
</pre>
<p>In my example, the first 3333 loops would go to <code>tracepath/ust/pid/sample-17541-20130613-135151</code> and the last 3333 loops to <code>tracepath/ust/pid/sample-17541-20130613-135151-1</code></p>
<p>I suppose a similar problem can happen with per-uid traces.</p> LTTng-tools - Bug #409 (Confirmed): Detection of pipe close with POLLHUP poll(3) eventhttps://bugs.lttng.org/issues/4092012-12-10T21:33:54ZChristian Babeuxchristian.babeux@efficios.com
<p>On Linux, the POLLHUP poll(3) event is used to signal that the other end<br />of a pipe has been disconnected. Due to poor wording in the Single UNIX<br />Specification, differents UNIX implementation signal the EOF with<br />conflicting poll events [1].</p>
<p>The current implementation of pipe close detection in lttng-tools uses<br />the POLLHUP event. This could lead to infinite looping in threads on<br />platforms such as OpenBSD/Cygwin.</p>
<p>Possible workaround in <a class="changeset" title="Cygwin: Fix handling of wait pipe hangup by properly detecting EOF On Linux, the POLLHUP poll(3)..." href="https://bugs.lttng.org/projects/lttng-tools/repository/lttng-tools/revisions/060a32b279132ceeeef14b96a611077195a2ca46">060a32b279132ceeeef14b96a611077195a2ca46</a>.</p>
<p>Creating this issue so we don't forget this limitation if we want eventually<br />want to support those platforms.</p>
<p>[1] - <a class="external" href="http://www.greenend.org.uk/rjk/tech/poll.html">http://www.greenend.org.uk/rjk/tech/poll.html</a></p>