LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912024-02-21T17:04:43ZLTTng bugs repository
Redmine Babeltrace - Feature #1410 (New): Add pretty-print sink options to override how to format floatin...https://bugs.lttng.org/issues/14102024-02-21T17:04:43ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Babeltrace 2, just like its predecessor babeltrace 1.5, uses "%g" to print floating point numbers.</p>
<p>It would be useful to let users optionally override the formatting for floating point numbers, perhaps with a pretty print sink option. They could then choose if exponent notation should be used or not, choose the precision, etc.</p> LTTng-tools - Bug #1383 (New): The "cpu_id" context (available for filters) is not discoverable b...https://bugs.lttng.org/issues/13832023-07-23T16:22:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>I recently tried remembering how to filter by cpu_id when enabling an event, and found that there was not clear way to see that $ctx.cpu_id actually exists from the lttng man pages (other than a random example in the lttng-enable-event man page). AFAIU the man pages rely on lttng add-context --list to let the user discover the available contexts. This is likely because the cpu_id context is not available for the "add-context" command because it would be redundant with the implicit context already sampled with the buffers.</p>
<p>We may want to have some way to let users discovers those contexts which are filter-specific.</p>
<p>My use-case is to try to grab only traces for a few logical cpus (0-3) from my 384 logical cpus machine.</p> LTTng-tools - Bug #1379 (New): Document behavior of live mode with per-pid UST buffershttps://bugs.lttng.org/issues/13792023-06-01T18:48:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>There is a design limitation of per-pid UST buffers when used in live mode which should be documented.</p>
<p>Basically, because the per-pid buffers are reclaimed soon after the application exits, it may cause the events traced by a short-lived application to never appear in the output of a live trace when per-pid buffers are used.</p>
<p>This is caused by the fact that the live client periodically checks for new buffers, but there is no inherent reference kept on the streams before they disappear.</p>
<p>This means that the information will be available in the disk output of the trace, but it will be missing from the live mode output.</p>
<p>This could eventually be mitigated by keeping references to the streams received by the relay daemon which are of interest to a live client until they are referenced by the live client.</p> LTTng-tools - Bug #1378 (New): Document behavior of snapshot mode with per-pid UST buffershttps://bugs.lttng.org/issues/13782023-06-01T18:01:12ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>We should document a design limitation of the per-pid UST buffers which can lead to unexpected results when used with the "snapshot" feature.</p>
<p>Basically, because the per-pid buffers lifetime is bound by the application using them (and the consumer daemon extracting data from them), they are freed almost immediately after the traced application exits. However, in a scenario where we are interested in capturing a snapshot of the content of flight recorder buffers soon after application exits (e.g. caused by a crash), those buffers are not available anymore a few milliseconds after the application exits.</p>
<p>We should document this design limitation, and eventually think about improvements (feature request ?) to allow a configurable delay after application exit during which the consumer daemon could keep references on the buffers so they are available for snapshot.</p> Userspace RCU - Feature #1368 (New): Integrate RCU implementation from libside into liburcuhttps://bugs.lttng.org/issues/13682023-02-14T14:26:10ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>libside features a multi-domain RCU implementation semantically similar to the Linux kernel's SRCU's implementation. It tracks grace periods with per-cpu counters rather than per-thread state.</p>
<p><a class="external" href="https://github.com/efficios/libside/blob/master/src/rcu.h">https://github.com/efficios/libside/blob/master/src/rcu.h</a><br /><a class="external" href="https://github.com/efficios/libside/blob/master/src/rcu.c">https://github.com/efficios/libside/blob/master/src/rcu.c</a></p>
<p>It is somewhat different from other urcu flavors because it supports multiple domains, whereas current liburcu flavors only have a single domain per process.</p>
<p>We would have to think thoroughly about the impacts of supporting multiple domains on other parts of liburcu such as the call_rcu worker thread. The worker thread would either have to be able to synchronize with multiple RCU domains, or we would have to require each worker thread to be associated with a single RCU domain.</p>
<p>Similar concerns arise with respect to data structures such as rculfhash which take a urcu flavor as parameter: with per-domain flavor, those would have to be associated with a specific RCU domain in addition to the flavor.</p> LTTng-tools - Feature #1341 (New): Introduce "--shm-path-ust" to clarify use vs documentationhttps://bugs.lttng.org/issues/13412021-12-07T20:53:52ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Considering that the "--shm-path" argument is documented as applying to all domains, but that currently the kernel domain does not support this argument, it leads us to a situation where users may budget their NVRAM for the space needed only for UST buffers, and eventually upgrade to a new LTTng version which would implement kernel support for shm-path. They would then face issues given the lack of available space.</p>
<p>I would recommend to add a new "--shm-path-ust" argument as an alias to "--shm-path", but document that it only applies to the UST domain. This way, users wishing to budget their NVRAM space only for UST tracing, but still perform kernel tracing in the same session, will be able to do so.</p> LTTng-tools - Bug #1340 (New): Document behavior of per-uid UST buffers with respect to asynchron...https://bugs.lttng.org/issues/13402021-12-07T20:22:28ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>We should document a known design limitation of the per-uid UST buffers:</p>
<p>When using lttng-ust with per-uid buffers, there is a known design limitation<br />regarding process terminated by non-handled kill signals: if this occurs while<br />the buffer is being written to (between reserve and commit), it will cause the<br />rest of the per-cpu buffer to be unreadable. This applies to all events<br />recorded by all applications of the same user id for that particular CPU.<br />Recovering from this involves destroying and re-creating the tracing session.</p>
<p>If this kind of scenario is likely for you, you might want to consider using per-<br />pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer from<br />this design limitation. The downside of per-pid buffers is that it allocates<br />more memory on your system for buffering, and adds extra overhead when<br />used with processes that have a short life-time.</p> Babeltrace - Feature #1300 (New): babeltrace2 lacks knowledge of bitwise enum produced by lttng-m...https://bugs.lttng.org/issues/13002021-03-23T17:41:47ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>This is a follow up on <a class="external" href="https://review.lttng.org/c/babeltrace/+/3045">https://review.lttng.org/c/babeltrace/+/3045</a> now that end users are starting to contact us wondering why up-to-date lttng and babeltrace2 show incomplete information for bitwise enums.</p>
<p>I think we should implement a better stop-gap solution while awaiting CTF2. I recommend that we specialize the babeltrace2 CTF input plugin to detect traces produced by the lttng-modules kernel tracer, for specific known (hardcoded) events and fields which have a bitwise enum semantic, and use this hardcoded knowledge to print the correct bitwise enum to the end user rather than "<unknown>".</p>
<p>This should ensure the current producer/consumer tools work well together (show complete information about those bitwise enums) without requiring to modify tons of documentation about the specific flags needed for bt2 to handle lttng traces appropriately. And this would not lead to misleading scenarios where we print an unknown value as a bitwise enum by mistake.</p>
<p>Hardcoded producer detection/event/fields is of course a last resort solution awaiting proper self-description in the upcoming CTF2 spec.</p> Babeltrace - Bug #1299 (Confirmed): babeltrace2 re-uses prior event string rather than expected e...https://bugs.lttng.org/issues/12992021-03-04T18:49:50ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Trying to reproduce the page fault scenario for an openat() event. Note that the behavior of generating a page fault may be toolchain-specific (linker-specific):</p>
<p>Test program "do-pagefault-open":</p>
<p>#include <stdio.h><br />#include <sys/types.h><br />#include <sys/stat.h><br />#include <fcntl.h></p>
<p>const char *str = "/tmp/blah";<br />const char *str2 = "/tmp/blah2";</p>
<p>int main()
{<br /> open(str, O_RDWR);<br /> perror("first");<br /> open(str2, O_RDWR);<br /> perror("second");<br /> return 0;<br />}</p>
<p>then trace with:</p>
<p>lttng-trace ./do-pagefault-open</p>
<p>Relevant events with babeltrace2 (grepping for 'openat'):</p>
<p>[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }<br />[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296960451] (+0.000006223) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }</p>
<p>Same trace with babeltrace 1:</p>
<p>[warning] Unknown value 0 in enum.<br />[warning] Unknown value 0 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 5 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 3 in enum.<br />[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[warning] Unknown value 0 in enum.<br />[warning] Unknown value 0 in enum.<br />[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }<br />[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }</p>
<p>Notice how filename = "/lib/x86_64-linux-gnu/libc.so.6" appears twice in the bt2 output, where we would expect the empty string (as provided by bt1).</p>
<p>Tested with babeltrace2 commit c429f86d0.</p> LTTng - Feature #1288 (New): Combine all per-cpu shm areas for a channel into a single filehttps://bugs.lttng.org/issues/12882020-10-13T15:39:03ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>On the reason for having one shm file per cpu (per channel/per uid/per session): having one file per cpu is mainly done to facilitate NUMA-local memory allocation. That being said, I wonder if as a future improvement we could combine all per-cpu buffers into a single shm file, and issue numa_set_preferred piecewise when mmapping regions of the files. This would lessen the number of file descriptors needed by a significant amount. I suspect it would work, but we'd need to do some prototyping to validate this approach first.</p> LTTng-tools - Feature #1287 (New): Use abstract sockets for lttng-consumerd UST shared memory fileshttps://bugs.lttng.org/issues/12872020-10-13T15:35:32ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Abstract sockets (unix(7)) are not tied to the filesystem, and are available since Linux 2.2.</p>
<p>Those are Linux-specific.</p>
<p>Those are the same as regular unix domain but their first character of path is NULL. They have the benefit of not requiring unlinking of files left behind.</p>
<p>We could use those abstract sockets in lttng-consumerd on Linux.</p> LTTng-modules - Feature #1265 (New): Turn lttng-probes.c probe_list into a hash tablehttps://bugs.lttng.org/issues/12652020-05-06T17:14:59ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Turns O(n^2) trace systems registration (cost for n systems) into O(n). (O(1) per system)</p> LTTng-modules - Feature #1264 (New): NOHZ support for lib ring bufferhttps://bugs.lttng.org/issues/12642020-05-06T17:14:02ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>NOHZ infrastructure in the Linux kernel does not support notifiers chains, which does not let LTTng play nicely with low power consumption setups for flight recorder (overwrite mode) live traces. ne way to allow integration between NOHZ and LTTng would be to add support for such notifiers into NOHZ kernel infrastructure.</p> LTTng-modules - Feature #1263 (New): Re-integrate sample modules from libringbuffer into lttng-mo...https://bugs.lttng.org/issues/12632020-05-06T17:13:20ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Re-integrate sample modules from libringbuffer into lttng-modules. Those modules can be used as example of how to use libringbuffer in other contexts than LTTng, and are useful to perform benchmarks of the ringbuffer library. See: <a class="external" href="https://github.com/compudj/linux-libringbuffer/tree/tip-current-ringbuffer-0.240">https://github.com/compudj/linux-libringbuffer/tree/tip-current-ringbuffer-0.240</a></p> LTTng-tools - Bug #1256 (New): Check babeltrace1 stderr output in testshttps://bugs.lttng.org/issues/12562020-04-08T19:23:18ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Currently, babeltrace1 stderr's output contains warnings which are not checked by the tests. Those do not fail the overall babeltrace execution, but end up being unexpected regressions when they appear between releases.</p>
<p>We could modify all babeltrace invocations to add a helper which saves the stderr output, and check that it is empty after babeltrace has completed execution. If non-empty, a test error would be reported.</p>