LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912023-12-19T20:03:54ZLTTng bugs repository
Redmine LTTng-tools - Bug #1407 (New): Hang when stopping a live session with a low live valuehttps://bugs.lttng.org/issues/14072023-12-19T20:03:54ZKienan Stewart
<p>While investigating <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Investigate why events are no longer recorded by the live view when `DELAYUS` in `tests/regressio... (Resolved)" href="https://bugs.lttng.org/issues/1403">#1403</a>, I noticed some test runs would intermittently hang when the live delay value was set very low (eg. 1us).</p>
<p>For example, in <code>tests/regression/tools/clear/test_ust</code> the test <code>test_ust_streaming_live</code> would hang executing <code>lttng stop sessionX</code>. After investigating the issue I noted the following:</p>
<ul>
<li>the lttng client is hanging, pending a response from the sessiond</li>
<li>the consumer's consumer_thread_data_poll thread appear to be looping in <code>consumer_timer_signal_thread_qs</code> while trying to remove the monitor timer during channel deletion: <pre>
Thread 7 (Thread 0x7f999bfff680 (LWP 3060295) "lttng-consumerd"):
#0 sigpending (set=0x7f999bffdea0) at ../sysdeps/unix/sysv/linux/sigpending.c:27
#1 0x00005559299f07ec in consumer_timer_signal_thread_qs (signr=47) at consumer/consumer-timer.cpp:325
#2 consumer_channel_timer_stop (timer_id=timer_id@entry=0x7f998c0021c0, signal=47) at consumer/consumer-timer.cpp:422
#3 0x00005559299f122f in consumer_timer_monitor_stop (channel=channel@entry=0x7f998c000f40) at consumer/consumer-timer.cpp:531
#4 0x00005559299dabbf in consumer_del_channel (channel=channel@entry=0x7f998c000f40) at consumer/consumer.cpp:379
#5 0x00005559299ee83a in consumer_stream_destroy (stream=stream@entry=0x7f998c01a830, ht=<optimized out>) at consumer/consumer-stream.cpp:1070
#6 0x00005559299e3c61 in consumer_del_stream (ht=<optimized out>, stream=0x7f998c01a830) at consumer/consumer.cpp:547
#7 consumer_thread_data_poll (data=0x55592b498ba0) at consumer/consumer.cpp:2747
#8 0x00007f99aaaa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#9 0x00007f99aab26a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
</pre></li>
<li>the consumerd's consumer_timer_thread was often handling the live timer: <pre>
Thread 4 (Thread 0x7f99a991b680 (LWP 3060292) "lttng-consumerd"):
#0 __GI___libc_write (nbytes=124, buf=0x7f99a9919e90, fd=2) at ../sysdeps/unix/sysv/linux/write.c:26
#1 __GI___libc_write (fd=2, buf=0x7f99a9919e90, nbytes=124) at ../sysdeps/unix/sysv/linux/write.c:24
#2 0x00007f99aaa9dbb5 in _IO_new_file_write (f=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=0x7f99a9919e90, n=124) at ./libio/fileops.c:1180
#3 0x00007f99aaa9cf70 in new_do_write (fp=fp@entry=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=data@entry=0x7f99a9919e9
0 "DBG1 - 10:31:33.730698608 [3060287/3060292]: Live timer for channel 17 (in live_timer() at consumer/consumer-timer.cpp:284)\n", to_do=to_do@entry=124) at ./libio/libioP.h:946
#4 0x00007f99aaa9e2a1 in _IO_new_file_xsputn (n=124, data=<optimized out>, f=0x7f99aabf26a0 <_IO_2_1_stderr_>) at ./libio/fileops.c:1254
#5 _IO_new_file_xsputn (f=0x7f99aabf26a0 <_IO_2_1_stderr_>, data=<optimized out>, n=124) at ./libio/fileops.c:1196
#6 0x00007f99aaa71409 in __printf_buffer_flush_to_file (buf=buf@entry=0x7f99a9919e60) at ../libio/libioP.h:946
#7 0x00007f99aaa714c0 in __printf_buffer_to_file_done (buf=buf@entry=0x7f99a9919e60) at ./stdio-common/printf_buffer_to_file.c:120
#8 0x00007f99aaa7ac0d in __vfprintf_internal (s=0x7f99aabf26a0 <_IO_2_1_stderr_>, format=0x555929a57cd8 "DBG1 - %s [%s]: Live timer for channel %lu (in %s() at consumer/consumer-timer.cpp:284)\n", ap=ap@entry=0x7f99a9919f60, mode_
flags=mode_flags@entry=0) at ./stdio-common/vfprintf-internal.c:1475
#9 0x00007f99aaa70296 in __fprintf (stream=<optimized out>, format=<optimized out>) at ./stdio-common/fprintf.c:32
#10 0x00005559299f2280 in live_timer (si=0x7f99a991a160, ctx=0x55592b498ba0) at consumer/consumer-timer.cpp:284
#11 consumer_timer_thread (data=0x55592b498ba0) at consumer/consumer-timer.cpp:789
#12 0x00007f99aaaa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#13 0x00007f99aab26a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
</pre></li>
</ul>
<p><code>consumer_timer_signal_thread_qs</code> is busy loop until the the the signal in question is no longer pending. In <code>consumer-timer.cpp::consumer_timer_thread</code>, the signals are waited on by <code>sigwaitinfo</code>, and there are some ordering rules (see man 7 signal):</p>
<ul>
<li>for RT signals, by signal priority</li>
<li>for standard signals, an unspecified order</li>
<li>in Linux standard signals are delivered before RT signals</li>
</ul>
<p>The signal priorities are their signal number as defined [[<a class="external" href="https://github.com/lttng/lttng-tools/blob/dff46b736afb7945aaf635de661b920dd0d01d7f/src/common/consumer/consumer-timer.hpp#L17|here">https://github.com/lttng/lttng-tools/blob/dff46b736afb7945aaf635de661b920dd0d01d7f/src/common/consumer/consumer-timer.hpp#L17|here</a>]]. Lower values have a higher priority.</p>
<p>When the live session is configured with a low delay, eg <code>--live=1</code>, the live timer is firing relatively rapidly and can prevent the signal fired by the monitor timer from being handled. This in turn means that the session destruction may hang until (hopefully) the monitor signal is handled and no longer pending.</p>
<p>A few options were discussed:</p>
<ul>
<li>changing signal ordering (eg. reducing the priority of live signal): while it would address this particular case, it then means a session with a low value for the monitor timer could equally starve out the handling of the live timer's signal</li>
<li>using <code>sigpending</code> in combination with <code>sigwaitinfo</code>: while it's possible to get all pending signals as a set, it seems to not be possible to retrieve the details (<code>siginfo_t</code>) which are required some some data (eg. pointers) to properly handle the signal</li>
<li>signalfd: very linux specific</li>
<li>timer_fd: very linux specific</li>
<li>custom scheduler: eg., <a class="external" href="https://github.com/nsec/badge-conf-2023/blob/nsec2023/lib/scheduling/scheduler.hpp">https://github.com/nsec/badge-conf-2023/blob/nsec2023/lib/scheduling/scheduler.hpp</a></li>
<li>switch from using channel pointers in the <code>sigval_ptr</code> to passing IDs that can be used for lookups which may fail gracefully instead; that way the pending signals wouldn't have to all be cleared in <code>consumer_timer_signal_thread_qs</code></li>
</ul>
<p>It seems the most promising path is to implement a small deadline scheduler.</p> LTTng-tools - Bug #1402 (New): Some files are missing REUSE licensing and copyright informationhttps://bugs.lttng.org/issues/14022023-11-14T15:03:06ZPhilippe Proulxeeppeliteloop@gmail.com
<p>Running <code>reuse lint</code> at the root of the source tree reveals many files are missing copyright and/or license information (SPDX):</p>
<blockquote>
<p>Unfortunately, your project is not compliant with version 3.0 of the REUSE Specification :-(</p>
</blockquote>
<p>This includes pretty much all the <code>Makefile.am</code> and <code>*.m4</code> files as well as <code>src/vendor/fmt/*</code>, <code>src/vendor/msgpack/*</code>, and <code>src/vendor/optional.hpp</code>.</p>
<p>Note that to keep vendor files unmodified, you may use the <code>.license</code> or <code>.reuse/dep5</code> <a href="https://reuse.software/spec/" class="external">strategies</a>.</p> LTTng-tools - Bug #1400 (New): Completion of error handlinghttps://bugs.lttng.org/issues/14002023-10-26T20:33:03ZMarkus Elfring
<p>I have looked at a few source files for your current software.<br />I have noticed that some checks for return codes are missing.</p>
Would you like to add more error handling for return values from functions like the following?
<ul>
<li>fprintf ⇒ usage<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-consumerd/lttng-consumerd.cpp#L159-L200">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-consumerd/lttng-consumerd.cpp#L159-L200</a></li>
<li>pthread_mutex_lock ⇒ lttng_thread_create<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/thread.cpp#L98">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/thread.cpp#L98</a></li>
</ul> LTTng-tools - Bug #1399 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13992023-10-26T20:02:18ZMarkus Elfring
<p>I would like to point out that some identifiers do eventually not fit to the expected naming convention of the C++ language standard.<br /><a class="external" href="https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL51-CPP.+Do+not+declare+or+define+a+reserved+identifier">https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL51-CPP.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_LTTNG_BUG_H<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/common/bug.hpp#L8-L9">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/common/bug.hpp#L8-L9</a></li>
<li>_LTT_UTILS_H<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/utils.hpp#L8-L9">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/utils.hpp#L8-L9</a></li>
</ul> LTTng-UST - Bug #1398 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13982023-10-26T19:32:57ZMarkus Elfring
<p>I would like to point out that some identifiers do eventually not fit to the expected naming convention of the C language standard.<br /><a class="external" href="https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier">https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_UST_COMMON_CLOCK_H<br /> <a class="external" href="https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/clock.h#L7-L8">https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/clock.h#L7-L8</a></li>
<li>__lttng_ust_siov<br /> <a class="external" href="https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/snprintf/fvwrite.h#L18">https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/snprintf/fvwrite.h#L18</a></li>
</ul> Userspace RCU - Bug #1396 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13962023-10-26T15:17:27ZMarkus Elfring
<p>I would like to point out that a few identifiers do eventually not fit to the expected naming convention of the C language standard.<br /><a class="external" href="https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier">https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_URCU_WAIT_H<br /> <a class="external" href="https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/src/urcu-wait.h#L5">https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/src/urcu-wait.h#L5</a></li>
<li>__cds_wfcq_init<br /> <a class="external" href="https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/include/urcu/wfcqueue.h#L136">https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/include/urcu/wfcqueue.h#L136</a></li>
</ul> LTTng-tools - Bug #1392 (Feedback): lttng view: Stream 0 is not declared in metadatahttps://bugs.lttng.org/issues/13922023-09-27T20:38:17ZRicardo Nabinger Sanchezrnsanchez@gmail.com
<p>While trying to capture UST events from a simple program, even though kernel events seem to be properly captured, UST events are not. Trace Compass cannot decode the UST data (but it can decode from the kernel channel).<br />The application is single-threaded, performs only disk and stdout I/O, nothing fancy. Compiled with <code>-finstrument-functions</code>, no optimizations, and with debugging symbols.</p>
<p>Session setup:<br /><pre>
# lttng create
# lttng enable-channel -u --subbuf-size=64M --num-subbuf=8 chan_ust
# lttng enable-channel -k --subbuf-size=64M --num-subbuf=2 chan_kernel
# lttng add-context -u -t vpid -t ip -t procname -t vtid -c chan_ust
# lttng enable-event -u -a -c chan_ust
# lttng enable-event -k -c chan_kernel --syscall --all
# lttng enable-event -k -c chan_kernel lttng_statedump_block_device,lttng_statedump_file_descriptor,lttng_statedump_process_state,mm_page_alloc,mm_page_free,net_dev_xmit,netif_receive_skb,sched_pi_setprio,sched_process_exec,sched_process_fork,sched_switch,sched_wakeup,sched_waking,softirq_entry,softirq_exit,softirq_raise
# lttng start
(possibly excessive preloads, but these are from my notes, collecting data on Varnish Cache)
# LTTNG_UST_BLOCKING_RETRY_TIMEOUT=-1 LD_PRELOAD=liblttng-ust-fd.so:liblttng-ust-fork.so:liblttng-ust-dl.so:liblttng-ust-cyg-profile.so:liblttng-ust-libc-wrapper.so:liblttng-ust-pthread-wrapper.so tests/test-playlist tests/playlist/bug_ffmpeg_1000-x20.m3u
# lttng stop
# lttng view > /dev/null
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
</pre></p>
Environment:
<ul>
<li>lttng-modules-2.13.10 (from tarball)</li>
<li>lttng-tools-2.13.11 (from tarball)</li>
<li>lttng-ust-2.13.6 (from tarball)</li>
<li>babeltrace (git; at <code>v1.5.11</code>)</li>
<li>Linux kernel 6.6.0-rc1</li>
<li>Slackware64-current, up-to-date as of 2023-09-26</li>
<li>Trace Compass 9.1.0.202309200833</li>
</ul>
<p>IRC snippet:<br /><pre>
Sep/27 16:13:54 rnsanchez I'm trying to find out what I'm doing wrong.. I keep getting "[error] Stream 0 is not declared in metadata." in my captures. I guess this is making tracecompass not being able to show UST events (simple function enter-exit instrumentation)
Sep/27 16:16:46 rnsanchez using my old notes, while checking https://lttng.org/docs/v2.13/#doc-liblttng-ust-cyg-profile and other sections
Sep/27 16:18:25 jgalar rnsanchez: shot in the dark, but are you making sure the session is stopped or destroyed before reading the trace?
Sep/27 16:18:28 rnsanchez if it helps knowing, events from kernel channel are fine and I can see them in tracecompass (the trace is huge)
Sep/27 16:18:52 rnsanchez jgalar: yes, with lttng stop, then I copy the data from root to my user
Sep/27 16:19:10 rnsanchez but I get that Stream 0 regardless.. it's from lttng view
Sep/27 16:20:11 jgalar it sounds like a metadata flushing issue, any chance you can share the trace? (or at least, the metadata files)
Sep/27 16:20:42 rnsanchez yes, sure. just a moment while I upload it
Sep/27 16:20:48 jgalar okay thanks
Sep/27 16:21:43 ebugden not sure if related, but it's triggering deja vu of this issue: https://github.com/efficios/barectf/commit/d024537859c1d869bfa1cedc8abe8e3f7a648faa
Sep/27 16:22:23 ebugden maybe a similar problem in a similar area in lttng view
Sep/27 16:25:28 rnsanchez jgalar: https://rnsanchez.wait4.org/auto-20230927-161138.tar.xz
Sep/27 16:27:33 rnsanchez jgalar: this last one has incomplete events as I was trying to investigate, but I can bundle another one with full events as I used to collect
Sep/27 16:31:51 rnsanchez ebugden: is there anything on my side I can do?
Sep/27 16:34:38 ebugden rnsanchez: i don't feel i'm familiar enough with this situation to say; i was tossing out the association as possible food for thought for jgalar
Sep/27 16:35:12 rnsanchez oh ok :-)
Sep/27 16:43:03 rnsanchez jgalar: just in case, here is with full events: https://rnsanchez.wait4.org/auto-20230927-164014.tar.xz
Sep/27 16:45:53 jgalar ebugden: yup, the errors are indeed similar
Sep/27 16:50:59 jgalar hmm, the user space trace's metadata doesn't declare any stream class, that's... unexpected...
Sep/27 16:51:14 jgalar which versions of lttng-ust and lttng-tools are you running?
Sep/27 16:55:03 rnsanchez ust 2.13.6 (tarball) and tools 2.13.11 (tarball too)
Sep/27 16:55:07 rnsanchez compiled today
Sep/27 16:55:13 jgalar okay
Sep/27 17:17:57 ebugden rnsanchez (cc: jgalar): would you open a bug report on https://bugs.lttng.org/ ?
Sep/27 17:18:16 rnsanchez ebugden: sure
Sep/27 17:19:21 ebugden thanks! (we have a few hypotheses, but we'd need more information about exactly how the trace is generated)
Sep/27 17:19:58 jgalar rnsanchez: it's weird, the user space trace's stream files just have the packet headers... if you can reproduce the problem while running lttng-sessiond with the `-vvv --verbose-consumer` options to get the logs, it would be helpful
Sep/27 17:20:47 rnsanchez I can try. I suppose I have to finish the sessiond already running and manually launch another?
Sep/27 17:21:06 jgalar yep
</pre></p> LTTng - Bug #1385 (Feedback): [lttng-live] error: recevie a message in the pasthttps://bugs.lttng.org/issues/13852023-08-16T18:53:52ZBin Yuan
<p>My program works well most of time. But in rare case, it reports the following hightling error.<br /><img src="https://bugs.lttng.org/attachments/download/572/clipboard-202308170250-vfley.jpg" alt="" loading="lazy" /></p>
<p>What will lead to those branch? any suggestion to debug?</p> Babeltrace - Bug #1384 (New): Assertion `bt_value_get_type(map) == BT_VALUE_TYPE_MAP` failshttps://bugs.lttng.org/issues/13842023-08-08T18:07:42ZErica Bugden
<p>Hello! I triggered an internal assertion while trying to, presumably incorrectly, query the lttng-live component class.</p>
<a name="Software"></a>
<h1 >Software<a href="#Software" class="wiki-anchor">¶</a></h1>
<ul>
<li>lttng tools: <code>lttng (LTTng Trace Control) 2.14.0-pre - O-Beer - v2.12.0-rc1-1528-g5983bb6d3</code></li>
<li>babeltrace2: <code>Babeltrace 2.1.0-rc1 "Codename TBD" [v1.2.0-3704-gdbea6be2]</code> (configured with plugins, python bindings support)</li>
</ul>
<a name="Procedure"></a>
<h1 >Procedure<a href="#Procedure" class="wiki-anchor">¶</a></h1>
<ul>
<li>Start root session daemon: <code>$sudo lttng-sessiond --daemonize</code></li>
<li>Create live session: <code>$lttng create my-session --live</code></li>
<li>Enable events: <code>$lttng enable-event --kernel sched-switch,sched_process_fork</code></li>
<li>Start tracing <code>$lttng start</code></li>
<li>Run python script: <code>python3 lttng-live.py</code> (see below)
<ul>
<li>Assertion</li>
</ul></li>
</ul>
<a name="Python-script"></a>
<h1 >Python script<a href="#Python-script" class="wiki-anchor">¶</a></h1>
<p>Note: The script is noisy as some code/comments are irrelevant/incorrect/inconsistent. (It's originally based on the query example in the Python bindings documentation.)</p>
<pre>
import bt2
import sys
# Get the `source.ctf.fs` component class from the `ctf` plugin.
# `source.ctf.lttng-live` component class instead
# (list plugins/components with $babeltrace2 list-plugins)
comp_cls = bt2.find_plugin('ctf').source_component_classes['lttng-live']
# The `babeltrace.support-info` query operation expects a `type`
# parameter (set to `directory` here) and an `input` parameter (the
# actual path or string to check, in this case the first command-line
# argument).
#
# See `babeltrace2-query-babeltrace.support-info(7)`.
params = {
'url': 'net://localhost/host/luna/my-session'
}
'''
params = {
'type': 'string',
'input': sys.argv[1],
}
'''
# Create a query executor.
#
# This is the environment in which query operations happens. The
# queried component class has access to this executor, for example to
# retrieve the query operation's logging level.
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
'inputs=net://localhost/host/luna/my-session')
'''
query_exec = bt2.QueryExecutor(comp_cls, 'babeltrace.support-info',
params)
'''
# Query the component class through the query executor.
#
# This method returns the result.
result = query_exec.query()
# Print the result.
print(result)
# Try to iterate on the trace.
'''
for msg in bt2.TraceCollectionMessageIterator('net://localhost/host/luna/my-session'):
if type(msg) is bt2._EventMessageConst:
print(msg.event.name)
'''
</pre>
<p>If I remember correctly, <strong>the script line change that triggered the assertion was expressing the query parameters directly as a string</strong> (rather than a dictionary). As in this:</p>
<pre>
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
'inputs=net://localhost/host/luna/my-session')
</pre>
<p>instead of this:</p>
<pre>
params = {
'url': 'net://localhost/host/luna/my-session'
}
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
params)
</pre>
<a name="Output"></a>
<h1 >Output<a href="#Output" class="wiki-anchor">¶</a></h1>
<pre>
erica@luna:~$ python3 lttng-live.py
(╯°□°)╯︵ ┻━┻ param-validation.c:196: validate_map_value(): Assertion `bt_value_get_type(map) == BT_VALUE_TYPE_MAP` failed.
Aborted (core dumped)
</pre> LTTng-tools - Bug #1383 (New): The "cpu_id" context (available for filters) is not discoverable b...https://bugs.lttng.org/issues/13832023-07-23T16:22:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>I recently tried remembering how to filter by cpu_id when enabling an event, and found that there was not clear way to see that $ctx.cpu_id actually exists from the lttng man pages (other than a random example in the lttng-enable-event man page). AFAIU the man pages rely on lttng add-context --list to let the user discover the available contexts. This is likely because the cpu_id context is not available for the "add-context" command because it would be redundant with the implicit context already sampled with the buffers.</p>
<p>We may want to have some way to let users discovers those contexts which are filter-specific.</p>
<p>My use-case is to try to grab only traces for a few logical cpus (0-3) from my 384 logical cpus machine.</p> LTTng-tools - Bug #1381 (New): lttng list-triggers crashing with 2.13.9https://bugs.lttng.org/issues/13812023-06-16T19:23:50ZHeman Muresan
<p>This was discussed offline, but I am opening a ticket here for tracking purposes. With the latest 2.13.9 I'm seeing the following assert:<br /><pre>
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007fb11ad09723 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007fb11acbe876 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007fb11aca97e3 in __GI_abort () at abort.c:79
#4 0x00007fb11aca9719 in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:92
#5 0x00007fb11acb78e6 in __GI___assert_fail (assertion=assertion@entry=0x475fd2 "element_index < array->size", file=file@entry=0x4832e8 "../../../lttng-tools-2.13.9/src/common/dynamic-array.h",
line=line@entry=55, function=function@entry=0x498120 <__PRETTY_FUNCTION__.1> "lttng_dynamic_array_get_element") at assert.c:101
#6 0x0000000000469311 in lttng_dynamic_array_get_element (array=0x4b7f10, element_index=0) at ../../../lttng-tools-2.13.9/src/common/dynamic-array.h:55
#7 lttng_dynamic_array_get_element (element_index=0, array=0x4b7f10) at ../../../lttng-tools-2.13.9/src/common/dynamic-array.h:52
#8 lttng_action_path_copy (src=src@entry=0x4b7f10, dst=dst@entry=0x4b8250) at ../../../lttng-tools-2.13.9/src/common/actions/path.c:109
#9 0x000000000043595c in lttng_error_query_action_create (action_path=0x4b7f10, trigger=0x4b8240) at ../../../lttng-tools-2.13.9/src/common/error-query.c:230
#10 lttng_error_query_action_create (trigger=trigger@entry=0x4b5ea0, action_path=action_path@entry=0x4b7f10) at ../../../lttng-tools-2.13.9/src/common/error-query.c:206
#11 0x000000000042c62a in print_action_errors (action=<optimized out>, action_path_length=0, action_path_indexes=0x0, trigger=0x4b5ea0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:766
#12 print_one_action (trigger=0x4b5ea0, action=<optimized out>, action_path_indexes=0x0, action_path_length=0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:966
#13 0x000000000042d105 in print_one_trigger (trigger=0x4b5ea0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1124
#14 print_sorted_triggers (triggers=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1202
#15 cmd_list_triggers (argc=<optimized out>, argv=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1421
#16 0x0000000000410be0 in handle_command (argv=0x7ffc348fc9c0, argc=1) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:237
#17 parse_args (argv=0x7ffc348fc9b8, argc=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:426
#18 main (argc=<optimized out>, argv=0x7ffc348fc9b8) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:475
</pre></p>
<p>There's nothing particular fancy about the trigger being printed, it's just a buffer usage trigger. Here is it being displayed on an older version of lttng-tools:<br /><pre>
- name: foo_bar_bufferUsageTrigger
owner uid: 0
condition: buffer usage high
session name: foo_bar
channel name: foo_bar-ust
domain: ust
threshold (ratio): 0.70
errors: none
action:notify
errors: none
errors: none
</pre></p>
<p>This trigger is being created with the following pseudocode:<br /><pre>
pCondition = lttng_condition_buffer_usage_high_create();
lttng_condition_buffer_usage_set_session_name(...);
lttng_condition_buffer_usage_set_channel_name(...);
lttng_condition_buffer_usage_set_threshold_ratio(...);
lttng_condition_buffer_usage_set_domain_type();
pAction = lttng_action_notify_create();
pTrigger = lttng_trigger_create(pCondition, pAction);
lttng_register_trigger_with_name(pTrigger, pTriggerName);
</pre></p>
<p>I think this was introduced by this commit:<br /><a class="external" href="https://github.com/lttng/lttng-tools/commit/1ad48d9678f8f123dc7d44dbbfadc9957b7e970e#diff-e9fa6863d3580e2d95137ddaab0280d5391e830657a46f47cde39e0cfabc8fb7R109">https://github.com/lttng/lttng-tools/commit/1ad48d9678f8f123dc7d44dbbfadc9957b7e970e#diff-e9fa6863d3580e2d95137ddaab0280d5391e830657a46f47cde39e0cfabc8fb7R109</a><br />In lttng_action_path_copy the code as is doesn't handle the case of there being no destination action path. A way to fix this is to make this code handle this scenario, attached is a patch that does this. But, perhaps we shouldn't even be hitting this scenario where there is no action path and this is just a bandaid.</p>
<p>Thanks,<br />Heman</p> LTTng-tools - Bug #1379 (New): Document behavior of live mode with per-pid UST buffershttps://bugs.lttng.org/issues/13792023-06-01T18:48:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>There is a design limitation of per-pid UST buffers when used in live mode which should be documented.</p>
<p>Basically, because the per-pid buffers are reclaimed soon after the application exits, it may cause the events traced by a short-lived application to never appear in the output of a live trace when per-pid buffers are used.</p>
<p>This is caused by the fact that the live client periodically checks for new buffers, but there is no inherent reference kept on the streams before they disappear.</p>
<p>This means that the information will be available in the disk output of the trace, but it will be missing from the live mode output.</p>
<p>This could eventually be mitigated by keeping references to the streams received by the relay daemon which are of interest to a live client until they are referenced by the live client.</p> LTTng-tools - Bug #1378 (New): Document behavior of snapshot mode with per-pid UST buffershttps://bugs.lttng.org/issues/13782023-06-01T18:01:12ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>We should document a design limitation of the per-pid UST buffers which can lead to unexpected results when used with the "snapshot" feature.</p>
<p>Basically, because the per-pid buffers lifetime is bound by the application using them (and the consumer daemon extracting data from them), they are freed almost immediately after the traced application exits. However, in a scenario where we are interested in capturing a snapshot of the content of flight recorder buffers soon after application exits (e.g. caused by a crash), those buffers are not available anymore a few milliseconds after the application exits.</p>
<p>We should document this design limitation, and eventually think about improvements (feature request ?) to allow a configurable delay after application exit during which the consumer daemon could keep references on the buffers so they are available for snapshot.</p> LTTng-tools - Bug #1372 (In Progress): Consumer crashes during rotation (write to bad file descri...https://bugs.lttng.org/issues/13722023-04-25T11:44:50ZMaximilian Fickert
<p>Hi,</p>
<p>We are observing seemingly random crashes in the LTTng consumer daemon when tracing a C++ application with LTTng-UST. Our workload has a single <code>printf</code>-like tracepoint, where each string is in the order of 1kb and the total output is around 30MB/s. LTTng is set up with a single session and channel enabling this tracepoint, and we enabled rotation with a maximum size of 100MB or every 30 seconds. We are periodically starting new traced processes and the system runs close to 100% CPU load. This ran on an AWS Graviton2 (ARM) instance with CentOS 7 and a 5.4 kernel, using LTTng-UST 2.13.5 and LTTng-tools 2.13.8.</p>
<p>The first reported error is a write to a bad file descriptor (-1), apparently when waking up the metadata poll thread during a rotation.</p>
<p>I looked through the LTTng logs with <code>-vvv --verbose-consumer</code> and I suspect this might be caused by some sort of race condition; looking at the following log fragments (I inserted two additional debug lines in LTTng so the line numbers may not exactly match those in the 2.13.8 source code):<br /><pre>
DBG1 - 15:12:12.865621802 [Rotation]: Consumer rotate channel key 574 (in consumer_rotate_channel() at consumer.c:1694)
</pre><br />A rotation is requested for the channel with key 574. Then the metadata pipe of this channel is closed:<br /><pre>
DBG3 - 15:12:13.269805401 [UST application management]: Buffer registry per PID find id: 288 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.269811366 [UST application management]: No metadata to push for metadata key 574 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.269816979 [UST application management]: Consumer close metadata channel key 574 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.269826899 [6593/6605]: Incoming command on sock (in consumer_thread_sessiond_poll() at consumer.c:3283)
DBG1 - 15:12:13.269835259 [6593/6605]: UST consumer close metadata key 574 (in close_metadata() at ust-consumer.c:804)
DBG1 - 15:12:13.269838591 [6593/6605]: Closing metadata channel key 574 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3235)
DBG3 - 15:12:13.269841167 [6593/6605]: close() fd = 603 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3250)
DBG1 - 15:12:13.269847075 [6593/6605]: Received command on sock (in consumer_thread_sessiond_poll() at consumer.c:3299)
</pre><br />But then LTTng unsuccessfully attempts to wake up the metadata poll thread by writing to the file descriptor that was just closed (the <code>channel key = 574</code> in the metadata poll thread message below is <code>channel->key</code>):<br /><pre>
DBG1 - 15:12:13.271001175 [6593/6605]: Waking up metadata poll thread (writing to pipe): channel name = 'metadata', channel key = 574 (in consumer_metadata_wakeup_pipe() at consumer.c:888)
DBG3 - 15:12:13.271010093 [6593/6605]: write() fd = -1 (in consumer_metadata_wakeup_pipe() at consumer.c:892)
PERROR - 15:12:13.271014655 [6593/6605]: Failed to write to UST metadata pipe while attempting to wake-up the metadata poll thread: Bad file descriptor (in consumer_metadata_wakeup_pipe() at consumer.c:907)
Error: Failed to dump the metadata cache
Error: Rotate channel failed
</pre><br />After this, there are a bunch more errors in the logs, but I guess they are ultimately caused by the issue above:<br /><pre>
DBG1 - 15:12:13.273572566 [6593/6604]: Consumer mmap write() ret 446464 (len 446464) (in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1726)
lttng-consumerd: consumer.c:1639: lttng_consumer_on_read_subbuffer_mmap: Assertion `stream->net_seq_idx != (uint64_t) -1ULL || stream->trace_chunk' failed.
...
DBG1 - 15:12:13.475559588 [Consumer management]: Error when receiving data from the consumer socket 61 (in consumer_socket_recv() at consumer.c:159)
Error: Handling metadata request
DBG1 - 15:12:13.476382551 [Rotation]: Error when receiving data from the consumer socket 60 (in consumer_socket_recv() at consumer.c:159)
Error: Error pushing metadata to consumer
DBG1 - 15:12:13.476418727 [Rotation]: Consumer rotate channel key 604 (in consumer_rotate_channel() at consumer.c:1694)
DBG1 - 15:12:13.476424962 [Rotation]: Setting trace chunk close command to "move to completed chunk folder" (in lttng_trace_chunk_set_close_command() at trace-chunk.c:1797)
DBG1 - 15:12:13.476428671 [Rotation]: lttng_trace_chunk_rename_path from .tmp_new_chunk to (null) (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.c:755)
DBG3 - 15:12:13.476437516 [Rotation]: renameat() old_dirfd = 946, old_name = .tmp_new_chunk, new_dirfd = 946, new_name = 20230421T151158+0000-5, uid = 1004, gid = 1004 (in run_as_renameat() at runas.c:1883)
DBG1 - 15:12:13.476444999 [Rotation]: Using run_as worker (in run_as() at runas.c:1646)
Error: Error pushing metadata to consumer
DBG2 - 15:12:13.476569797 [UST application management]: Consumer close metadata channel key 624 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476585469 [UST application management]: PID 7729 unregistering with sock 1060 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476589957 [UST application management]: Flushing app session buffers for ust app pid 7729 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476611364 [UST application management]: Consumer flush channel key 651 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476616188 [UST application management]: Buffer registry per PID find id: 327 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.476620406 [UST application management]: No metadata to push for metadata key 652 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.476623277 [UST application management]: Consumer close metadata channel key 652 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476633312 [UST application management]: PID 11467 unregistering with sock 457 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476636996 [UST application management]: Flushing app session buffers for ust app pid 11467 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476640508 [UST application management]: Consumer flush channel key 661 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476643946 [UST application management]: Buffer registry per PID find id: 332 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG2 - 15:12:13.476649501 [UST application management]: Consumer push metadata to consumer socket -1 (in consumer_push_metadata() at consumer.c:1462)
Error: Error pushing metadata to consumer
...
</pre></p>
<p>We currently do not have simple steps to reliably reproduce the issue, but this happens somewhat irregularly after 5-10 minutes (sometimes earlier, sometimes later) of running our workload. I attached a longer fragment of the logs around the time the error happens, let us know if we can provide any more information to help.</p> LTTng-tools - Bug #1371 (New): Cannot have two non-root session daemons in two separate Docker co...https://bugs.lttng.org/issues/13712023-04-18T23:33:11ZChristophe Bedard
<p>I have 2 separate Docker containers. The user in each container is the same user as my non-root host user (same name, same UID, same GID). Users in the 2 Docker containers have the same $HOME, which is a directory mounted from the host.</p>
<p>To set up and configure each container:</p>
<p>1. Start container and pass all necessary info through:</p>
<pre>
host$ mkdir ~/fake-home
host$ docker run -it --env USER=$USER --env GROUP=$USER --env USER_ID=$(id -u) --env GROUP_ID=$(id -g) -v $HOME/fake-home:$HOME ubuntu:22.04 bash
</pre>
<p>2. Inside each container, create user, install lttng-tools, and start shell as $USER:</p>
<pre>
containerN$ groupdel "$GROUP" &>/dev/null || true
containerN$ groupadd -og "$GROUP_ID" "$GROUP"
containerN$ useradd -M -u "$USER_ID" -g "$GROUP_ID" -d "/home/$USER" -s /bin/bash "$USER"
containerN$ usermod -aG sudo "$USER"
containerN$ apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y lttng-tools
containerN$ su - "$USER"
</pre>
<p>Then, I can start a non-root session daemon in container 1 and trace just fine:</p>
<pre>
container1$ lttng list # No session daemon, as expected
Error: No session daemon is available
container1$ lttng-sessiond --daemonize
container1$ pgrep lttng-sessiond # As expected, we have a session daemon
459
container1$ cat $HOME/.lttng/lttng-sessiond.pid # Matches above output
459
container1$ ps -o comm= $(cat $HOME/.lttng/lttng-sessiond.pid) # That PID indeed corresponds to the session daemon
lttng-sessiond
container1$ lttng list # Now this works, as expected
Currently no available recording session
container1$ # Trace, ..., works fine
container1$ babeltrace /home/$USER/lttng-traces/trace-container1 # Shows trace data
...
</pre>
<p>However, in container 2, LTTng thinks that there is already a session daemon, but tracing doesn't collect any data:</p>
<pre>
container2$ pgrep lttng-sessiond # No output, since there is no session daemon, as expected
container2$ lttng-sessiond --daemonize # Doesn't work, which is unexpected, since there isn't supposed to be a session daemon!
Error: A session daemon is already running.
container2$ cat $HOME/.lttng/lttng-sessiond.pid # Matches output from container 1, since they have the same $HOME directory
459
container1$ ps -o comm= 459 # ...but that PID doesn't correspond to any process in this container! (no output)
container2$ lttng list # Works, which is unexpected, since there isn't supposed to be a session daemon!
Currently no available recording session
container2$ # Trace, ..., works fine
container2$ babeltrace /home/$USER/lttng-traces/trace-container2 # Error, babeltrace says it's not a trace, which is unexpected
[error] Cannot open any trace for reading.
[error] opening trace "/home/$USER/lttng-traces/trace-container2" for reading.
[error] none of the specified trace paths could be opened.
container2$ ls /home/$USER/lttng-traces/trace-container2 # The trace directory has a ust/ subdirectory, but it's empty
ust
container2$ export LTTNG_HOME=~/some-other-dir # Setting/using a different LTTNG_HOME fixes everything
</pre>
<p>As mentioned in the last line above, using a different LTTNG_HOME allows container2 to successfully start its own non-root session daemon and collect trace data. However, the issue is that this fails silently: LTTng doesn't detect this weird configuration/state and doesn't give an error.</p>
<p>I can detect this myself by checking if the PID in the $HOME/.lttng/lttng-sessiond.pid file is a 'lttng-sessiond' process:</p>
<pre>
[ $(ps -o comm= $(cat $HOME/.lttng/lttng-sessiond.pid)) = 'lttng-sessiond' ]
</pre>
<p>If not, then that means that the session daemon isn't really "available," and I can report an error in my own tool and tell my user to set LTTNG_HOME to something other than $HOME and try again. I understand that this is a weird corner case, but it would be good if LTTng itself could handle this check, if possible.</p>
<p>Note: in my case, the $HOME directory inside the containers is not the actual host user's $HOME directory. However, if it were, maybe this could be reproducible with the host + 1 Docker container?</p>