LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912023-12-19T20:03:54ZLTTng bugs repository
Redmine LTTng-tools - Bug #1407 (New): Hang when stopping a live session with a low live valuehttps://bugs.lttng.org/issues/14072023-12-19T20:03:54ZKienan Stewart
<p>While investigating <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Investigate why events are no longer recorded by the live view when `DELAYUS` in `tests/regressio... (Resolved)" href="https://bugs.lttng.org/issues/1403">#1403</a>, I noticed some test runs would intermittently hang when the live delay value was set very low (eg. 1us).</p>
<p>For example, in <code>tests/regression/tools/clear/test_ust</code> the test <code>test_ust_streaming_live</code> would hang executing <code>lttng stop sessionX</code>. After investigating the issue I noted the following:</p>
<ul>
<li>the lttng client is hanging, pending a response from the sessiond</li>
<li>the consumer's consumer_thread_data_poll thread appear to be looping in <code>consumer_timer_signal_thread_qs</code> while trying to remove the monitor timer during channel deletion: <pre>
Thread 7 (Thread 0x7f999bfff680 (LWP 3060295) "lttng-consumerd"):
#0 sigpending (set=0x7f999bffdea0) at ../sysdeps/unix/sysv/linux/sigpending.c:27
#1 0x00005559299f07ec in consumer_timer_signal_thread_qs (signr=47) at consumer/consumer-timer.cpp:325
#2 consumer_channel_timer_stop (timer_id=timer_id@entry=0x7f998c0021c0, signal=47) at consumer/consumer-timer.cpp:422
#3 0x00005559299f122f in consumer_timer_monitor_stop (channel=channel@entry=0x7f998c000f40) at consumer/consumer-timer.cpp:531
#4 0x00005559299dabbf in consumer_del_channel (channel=channel@entry=0x7f998c000f40) at consumer/consumer.cpp:379
#5 0x00005559299ee83a in consumer_stream_destroy (stream=stream@entry=0x7f998c01a830, ht=<optimized out>) at consumer/consumer-stream.cpp:1070
#6 0x00005559299e3c61 in consumer_del_stream (ht=<optimized out>, stream=0x7f998c01a830) at consumer/consumer.cpp:547
#7 consumer_thread_data_poll (data=0x55592b498ba0) at consumer/consumer.cpp:2747
#8 0x00007f99aaaa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#9 0x00007f99aab26a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
</pre></li>
<li>the consumerd's consumer_timer_thread was often handling the live timer: <pre>
Thread 4 (Thread 0x7f99a991b680 (LWP 3060292) "lttng-consumerd"):
#0 __GI___libc_write (nbytes=124, buf=0x7f99a9919e90, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26
#1 __GI___libc_write (fd=2, buf=0x7f99a9919e90, nbytes=124) at ../sysdeps/unix/sysv/linux/write.c:24
#2 0x00007f99aaa9dbb5 in _IO_new_file_write (f=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=0x7f99a9919e90, n=124) at ./libio/fileops.c:1180
#3 0x00007f99aaa9cf70 in new_do_write (fp=fp@entry=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=data@entry=0x7f99a9919e9
0 "DBG1 - 10:31:33.730698608 [3060287/3060292]: Live timer for channel 17 (in live_timer() at consumer/consumer-timer.cpp:284)\n", to_do=to_do@entry=124) at ./libio/libioP.h:946
#4 0x00007f99aaa9e2a1 in _IO_new_file_xsputn (n=124, data=<optimized out>, f=0x7f99aabf26a0 <_IO_2_1_stderr_>) at ./libio/fileops.c:1254
#5 _IO_new_file_xsputn (f=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=<optimized out>, n=124) at ./libio/fileops.c:1196
#6 0x00007f99aaa71409 in __printf_buffer_flush_to_file (buf=buf@entry=0x7f99a9919e60) at ../libio/libioP.h:946
#7 0x00007f99aaa714c0 in __printf_buffer_to_file_done (buf=buf@entry=0x7f99a9919e60) at ./stdio-common/printf_buffer_to_file.c:120
#8 0x00007f99aaa7ac0d in __vfprintf_internal (s=0x7f99aabf26a0 <_IO_2_1_stderr_>, format=0x555929a57cd8 "DBG1 - %s [%s]: Live timer for channel %lu (in %s() at consumer/consumer-timer.cpp:284)\n", ap=ap@entry=0x7f99a9919f60, mode_
flags=mode_flags@entry=0) at ./stdio-common/vfprintf-internal.c:1475
#9 0x00007f99aaa70296 in __fprintf (stream=<optimized out>, format=<optimized out>) at ./stdio-common/fprintf.c:32
#10 0x00005559299f2280 in live_timer (si=0x7f99a991a160, ctx=0x55592b498ba0) at consumer/consumer-timer.cpp:284
#11 consumer_timer_thread (data=0x55592b498ba0) at consumer/consumer-timer.cpp:789
#12 0x00007f99aaaa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#13 0x00007f99aab26a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
</pre></li>
</ul>
<p><code>consumer_timer_signal_thread_qs</code> is busy loop until the the the signal in question is no longer pending. In <code>consumer-timer.cpp::consumer_timer_thread</code>, the signals are waited on by <code>sigwaitinfo</code>, and there are some ordering rules (see man 7 signal):</p>
<ul>
<li>for RT signals, by signal priority</li>
<li>for standard signals, an unspecified order</li>
<li>in Linux standard signals are delivered before RT signals</li>
</ul>
<p>The signal priorities are their signal number as defined [[<a class="external" href="https://github.com/lttng/lttng-tools/blob/dff46b736afb7945aaf635de661b920dd0d01d7f/src/common/consumer/consumer-timer.hpp#L17|here">https://github.com/lttng/lttng-tools/blob/dff46b736afb7945aaf635de661b920dd0d01d7f/src/common/consumer/consumer-timer.hpp#L17|here</a>]]. Lower values have a higher priority.</p>
<p>When the live session is configured with a low delay, eg <code>--live=1</code>, the live timer is firing relatively rapidly and can prevent the signal fired by the monitor timer from being handled. This in turn means that the session destruction may hang until (hopefully) the monitor signal is handled and no longer pending.</p>
<p>A few options were discussed:</p>
<ul>
<li>changing signal ordering (eg. reducing the priority of live signal): while it would address this particular case, it then means a session with a low value for the monitor timer could equally starve out the handling of the live timer's signal</li>
<li>using <code>sigpending</code> in combination with <code>sigwaitinfo</code>: while it's possible to get all pending signals as a set, it seems to not be possible to retrieve the details (<code>siginfo_t</code>) which are required some some data (eg. pointers) to properly handle the signal</li>
<li>signalfd: very linux specific</li>
<li>timer_fd: very linux specific</li>
<li>custom scheduler: eg., <a class="external" href="https://github.com/nsec/badge-conf-2023/blob/nsec2023/lib/scheduling/scheduler.hpp">https://github.com/nsec/badge-conf-2023/blob/nsec2023/lib/scheduling/scheduler.hpp</a></li>
<li>switch from using channel pointers in the <code>sigval_ptr</code> to passing IDs that can be used for lookups which may fail gracefully instead; that way the pending signals wouldn't have to all be cleared in <code>consumer_timer_signal_thread_qs</code></li>
</ul>
<p>It seems the most promising path is to implement a small deadline scheduler.</p> LTTng-modules - Feature #1395 (New): Add probe for io_uringhttps://bugs.lttng.org/issues/13952023-10-10T15:23:31ZKienan Stewart
<p>Requested on the lltng-dev list. C.f. <a class="external" href="https://lists.lttng.org/pipermail/lttng-dev/2023-October/030640.html">https://lists.lttng.org/pipermail/lttng-dev/2023-October/030640.html</a></p> Babeltrace - Bug #1374 (New): Babeltrace fails to read trace with no data stream type ID when the...https://bugs.lttng.org/issues/13742023-04-26T17:57:32ZErica Bugden
<p>Babeltrace is not able to read a trace that does not contain data stream IDs (empty packet header) when the metadata specifies an ID for the stream. Trace Compass is also unable to read this trace. If the lines mentioning the stream ID are removed from the metadata, then babeltrace is able to read the trace.</p>
<a name="Does-this-need-to-be-fixed"></a>
<h3 >Does this need to be fixed?<a href="#Does-this-need-to-be-fixed" class="wiki-anchor">¶</a></h3>
<p>This was first reported as a barectf issue <a class="external" href="https://github.com/efficios/barectf/issues/28">https://github.com/efficios/barectf/issues/28</a> , but after further consideration the issue appears to be in a grey area in the CTF 1.8 specification. Whether it is the CTF generator's (barectf) responsibility or the CTF reader's responsibility is up for debate.</p>
<p>The superfluous metadata information need not prevent the trace from being parsed. For example, yactfr <a class="external" href="https://github.com/eepp/yactfr">https://github.com/eepp/yactfr</a> is able to read the trace. However, considering that:</p>
<ul>
<li>At least two CTF readers (babeltrace and Trace Compass) cannot read the trace, and</li>
<li>It is easy to fix in barectf (Proposed fix: <a class="external" href="https://review.lttng.org/c/barectf/+/9868">https://review.lttng.org/c/barectf/+/9868</a> )</li>
</ul>
<p>this is not a high priority issue in babeltrace.</p>
<a name="Context"></a>
<h1 >Context<a href="#Context" class="wiki-anchor">¶</a></h1>
<p>- Background and reproducer: barectf github issue <a class="external" href="https://github.com/efficios/barectf/issues/28">https://github.com/efficios/barectf/issues/28</a> <br />- Background and specification discussion: barectf fix <a class="external" href="https://review.lttng.org/c/barectf/+/9868">https://review.lttng.org/c/barectf/+/9868</a></p> LTTng - Feature #1347 (New): Codename for 2.14https://bugs.lttng.org/issues/13472022-03-01T20:27:38ZMichael Jeansonmjeanson@efficios.com
<p>What should the codename of the LTTng 2.14 release be?</p>
<p>Must be a micro-brewed Quebec beer and start with a "O".</p> LTTng-tools - Bug #1342 (New): Rotation thread's handling of session consumed size notifications ...https://bugs.lttng.org/issues/13422021-12-09T21:40:35ZJérémie Galarneaujeremie.galarneau@efficios.com
<p>Related to this change:<br /><a class="external" href="https://review.lttng.org/c/lttng-tools/+/6886/">https://review.lttng.org/c/lttng-tools/+/6886/</a></p>
The race described in that change has another consequence: the notification could refer to a different session instance of the same name:
<ul>
<li>notification emitted for session <code>foo</code>,</li>
<li>session <code>foo</code> is destroyed,</li>
<li>a new session named <code>foo</code> is created,</li>
<li>a rotation is triggered for <code>foo</code>.</li>
</ul>
There are multiple solutions for this:
<ul>
<li>Provide a unique session id as part the of the evaluation of <code>SESSION_CONSUMED_SIZE</code>,</li>
<li>Register an internal trigger that has the <code>rotate session</code> action (since ids are sampled during their handling).</li>
</ul>
<p>I can't work on a fix right now, but here is a placeholder patch that will eventually contain the fix:<br /><a class="external" href="https://review.lttng.org/c/lttng-tools/+/6907">https://review.lttng.org/c/lttng-tools/+/6907</a></p>
<p>This is hypothetical and I noticed it during a code review; nobody reported this problem.</p> LTTng - Feature #1288 (New): Combine all per-cpu shm areas for a channel into a single filehttps://bugs.lttng.org/issues/12882020-10-13T15:39:03ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>On the reason for having one shm file per cpu (per channel/per uid/per session): having one file per cpu is mainly done to facilitate NUMA-local memory allocation. That being said, I wonder if as a future improvement we could combine all per-cpu buffers into a single shm file, and issue numa_set_preferred piecewise when mmapping regions of the files. This would lessen the number of file descriptors needed by a significant amount. I suspect it would work, but we'd need to do some prototyping to validate this approach first.</p> LTTng-tools - Feature #1287 (New): Use abstract sockets for lttng-consumerd UST shared memory fileshttps://bugs.lttng.org/issues/12872020-10-13T15:35:32ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Abstract sockets (unix(7)) are not tied to the filesystem, and are available since Linux 2.2.</p>
<p>Those are Linux-specific.</p>
<p>Those are the same as regular unix domain but their first character of path is NULL. They have the benefit of not requiring unlinking of files left behind.</p>
<p>We could use those abstract sockets in lttng-consumerd on Linux.</p> LTTng-tools - Bug #1267 (New): odr-violation for config_str_yeshttps://bugs.lttng.org/issues/12672020-05-13T01:16:50ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>Using "clang version 10.0.0-4ubuntu1".</p>
<p>Configure: "./configure CFLAGS="-g -O0 -fsanitize=address"</p>
<p>lttng-sessiond yield:<br /><pre>
=================================================================
==3248430==ERROR: AddressSanitizer: odr-violation (0x000000800e80):
[1] size=8 'config_str_yes' session-config.c:56:20
[2] size=8 'config_str_yes' session-config.c:56:20
These globals were registered at these points:
[1]:
#0 0x43447d in __asan_register_globals (/home/joraj/lttng/master/install/bin/lttng-sessiond+0x43447d)
#1 0x6e6a8b in asan.module_ctor (/home/joraj/lttng/master/install/bin/lttng-sessiond+0x6e6a8b)
[2]:
#0 0x43447d in __asan_register_globals (/home/joraj/lttng/master/install/bin/lttng-sessiond+0x43447d)
#1 0x7eff93c40a6b in asan.module_ctor (/home/joraj/lttng/master//install/lib/liblttng-ctl.so.0+0x144a6b)
==3248430==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'config_str_yes' at session-config.c:56:20
==3248430==ABORTING
</pre></p>
<p>The linker does not warn.</p> LTTng-modules - Feature #1265 (New): Turn lttng-probes.c probe_list into a hash tablehttps://bugs.lttng.org/issues/12652020-05-06T17:14:59ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Turns O(n^2) trace systems registration (cost for n systems) into O(n). (O(1) per system)</p> LTTng-modules - Feature #1264 (New): NOHZ support for lib ring bufferhttps://bugs.lttng.org/issues/12642020-05-06T17:14:02ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>NOHZ infrastructure in the Linux kernel does not support notifiers chains, which does not let LTTng play nicely with low power consumption setups for flight recorder (overwrite mode) live traces. ne way to allow integration between NOHZ and LTTng would be to add support for such notifiers into NOHZ kernel infrastructure.</p> LTTng-modules - Feature #1263 (New): Re-integrate sample modules from libringbuffer into lttng-mo...https://bugs.lttng.org/issues/12632020-05-06T17:13:20ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Re-integrate sample modules from libringbuffer into lttng-modules. Those modules can be used as example of how to use libringbuffer in other contexts than LTTng, and are useful to perform benchmarks of the ringbuffer library. See: <a class="external" href="https://github.com/compudj/linux-libringbuffer/tree/tip-current-ringbuffer-0.240">https://github.com/compudj/linux-libringbuffer/tree/tip-current-ringbuffer-0.240</a></p> LTTng-tools - Bug #1262 (New): Metadata channel properties are not configurable (kernel domain)https://bugs.lttng.org/issues/12622020-05-05T22:45:05ZJérémie Galarneaujeremie.galarneau@efficios.com
<p>It is possible to explicitly create a kernel channel named <em>metadata</em> in the kernel domain.</p>
<pre>
$ lttng enable-channel -k metadata --num-subbuf 256
Kernel channel metadata enabled for session youssef_robert
</pre>
<p>This channel can be listed:<br /><pre>
- metadata: [enabled]
Attributes:
Event-loss mode: discard
Sub-buffer size: 1048576 bytes
Sub-buffer count: 256
Switch timer: inactive
Read timer: 200000 µs
Monitor timer: 1000000 µs
Blocking timeout: 0 µs
Trace file count: 1 per stream
Trace file size: unlimited
Output mode: splice
Statistics:
Discarded events: 0
Event rules:
None
</pre></p>
<p>It is surprisingly possible to add event rules to such a channel.</p>
<pre>
lttng enable-event -k -a -c metadata
All Kernel events are enabled in channel metadata
</pre>
<p>Predictably, since this channel is not identified as a <em>special</em> metadata channel, the configuration is not used during the <a href="https://github.com/lttng/lttng-tools/blob/ab5be9f/src/bin/lttng-sessiond/kernel-consumer.c#L203" class="external">creation of the metadata stream</a>.</p>
<p>Instead, a <a href="https://github.com/lttng/lttng-tools/blob/ab5be9fa2eb5ba9600a82cd18fd3cfcbac69169a/src/bin/lttng-sessiond/trace-kernel.h#L100" class="external">dedicated ltt_kernel_metadata</a> field is used. This field is initialized during the creation of the kernel session with a set of default attributes.</p>
<p>It is unclear to me if this ever worked since I couldn't find a trace of the code that should handle this.</p> LTTng-tools - Bug #1256 (New): Check babeltrace1 stderr output in testshttps://bugs.lttng.org/issues/12562020-04-08T19:23:18ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Currently, babeltrace1 stderr's output contains warnings which are not checked by the tests. Those do not fail the overall babeltrace execution, but end up being unexpected regressions when they appear between releases.</p>
<p>We could modify all babeltrace invocations to add a helper which saves the stderr output, and check that it is empty after babeltrace has completed execution. If non-empty, a test error would be reported.</p> Babeltrace - Bug #1234 (Feedback): src.text.dmesg: some kernel ring buffer lines can be wrongly s...https://bugs.lttng.org/issues/12342020-02-17T21:54:00ZPhilippe Proulxeeppeliteloop@gmail.com
<p>The lines of the <code>dmesg</code> command start with a time. The lines are supposed to be in order of time, but some of them can be at the wrong place.</p>
<p><code>flt.utils.muxer</code> does not like this and complains that event messages are not sorted by their default clock snapshot value.</p>
<p>It is, in fact, a <code>src.text.dmesg</code> bug because a message iterator must emit messages in order of time.</p>
<p>If the input is a file, one solution would be to sort the lines first (if not too large), and then emit the messages in this order.</p>
<p>We could also, in all scenarios, skip the lines with a time that is before the last event message's time and warn accordingly.</p> Babeltrace - Bug #1223 (Feedback): Check for cmp and diff in configurehttps://bugs.lttng.org/issues/12232020-02-17T21:03:26ZSimon Marchisimon.marchi@polymtl.ca
<p>Seen somewhere in the configure output of babeltrace on a fresh centos 8 container:</p>
<pre>
checking for a working dd... ./configure: line 9826: cmp: command not found
checking if gcc supports -fno-rtti -fno-exceptions... ./configure: line 11790: diff: command not found
</pre>
<p>The configure keeps going, I think the consequence may simply be that the results of these checks are inaccurate.</p>
<p>For correctness, if these tools (cmd and diff) are used, there should be something that checks that they exist before using them.</p>