LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912023-11-14T15:03:06ZLTTng bugs repository
Redmine LTTng-tools - Bug #1402 (New): Some files are missing REUSE licensing and copyright informationhttps://bugs.lttng.org/issues/14022023-11-14T15:03:06ZPhilippe Proulxeeppeliteloop@gmail.com
<p>Running <code>reuse lint</code> at the root of the source tree reveals many files are missing copyright and/or license information (SPDX):</p>
<blockquote>
<p>Unfortunately, your project is not compliant with version 3.0 of the REUSE Specification :-(</p>
</blockquote>
<p>This includes pretty much all the <code>Makefile.am</code> and <code>*.m4</code> files as well as <code>src/vendor/fmt/*</code>, <code>src/vendor/msgpack/*</code>, and <code>src/vendor/optional.hpp</code>.</p>
<p>Note that to keep vendor files unmodified, you may use the <code>.license</code> or <code>.reuse/dep5</code> <a href="https://reuse.software/spec/" class="external">strategies</a>.</p> LTTng-tools - Bug #1400 (New): Completion of error handlinghttps://bugs.lttng.org/issues/14002023-10-26T20:33:03ZMarkus Elfring
<p>I have looked at a few source files for your current software.<br />I have noticed that some checks for return codes are missing.</p>
Would you like to add more error handling for return values from functions like the following?
<ul>
<li>fprintf ⇒ usage<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-consumerd/lttng-consumerd.cpp#L159-L200">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-consumerd/lttng-consumerd.cpp#L159-L200</a></li>
<li>pthread_mutex_lock ⇒ lttng_thread_create<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/thread.cpp#L98">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/thread.cpp#L98</a></li>
</ul> LTTng-tools - Bug #1399 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13992023-10-26T20:02:18ZMarkus Elfring
<p>I would like to point out that some identifiers do eventually not fit to the expected naming convention of the C++ language standard.<br /><a class="external" href="https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL51-CPP.+Do+not+declare+or+define+a+reserved+identifier">https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL51-CPP.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_LTTNG_BUG_H<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/common/bug.hpp#L8-L9">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/common/bug.hpp#L8-L9</a></li>
<li>_LTT_UTILS_H<br /> <a class="external" href="https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/utils.hpp#L8-L9">https://github.com/lttng/lttng-tools/blob/d4333eb996dec315fb721074c2cf49adee04363b/src/bin/lttng-sessiond/utils.hpp#L8-L9</a></li>
</ul> LTTng-UST - Bug #1398 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13982023-10-26T19:32:57ZMarkus Elfring
<p>I would like to point out that some identifiers do eventually not fit to the expected naming convention of the C language standard.<br /><a class="external" href="https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier">https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_UST_COMMON_CLOCK_H<br /> <a class="external" href="https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/clock.h#L7-L8">https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/clock.h#L7-L8</a></li>
<li>__lttng_ust_siov<br /> <a class="external" href="https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/snprintf/fvwrite.h#L18">https://github.com/lttng/lttng-ust/blob/8bc1125eb851b2c52d3263c2992e6806017e98e7/src/common/snprintf/fvwrite.h#L18</a></li>
</ul> Userspace RCU - Bug #1396 (New): reserved identifier violationhttps://bugs.lttng.org/issues/13962023-10-26T15:17:27ZMarkus Elfring
<p>I would like to point out that a few identifiers do eventually not fit to the expected naming convention of the C language standard.<br /><a class="external" href="https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier">https://www.securecoding.cert.org/confluence/display/c/DCL37-C.+Do+not+declare+or+define+a+reserved+identifier</a></p>
<p>Would you like to adjust your selection for unique names?</p>
Examples:
<ul>
<li>_URCU_WAIT_H<br /> <a class="external" href="https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/src/urcu-wait.h#L5">https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/src/urcu-wait.h#L5</a></li>
<li>__cds_wfcq_init<br /> <a class="external" href="https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/include/urcu/wfcqueue.h#L136">https://github.com/urcu/userspace-rcu/blob/008a71ef213f23c8d403e086563155121b286c0c/include/urcu/wfcqueue.h#L136</a></li>
</ul> LTTng-tools - Bug #1392 (Feedback): lttng view: Stream 0 is not declared in metadatahttps://bugs.lttng.org/issues/13922023-09-27T20:38:17ZRicardo Nabinger Sanchezrnsanchez@gmail.com
<p>While trying to capture UST events from a simple program, even though kernel events seem to be properly captured, UST events are not. Trace Compass cannot decode the UST data (but it can decode from the kernel channel).<br />The application is single-threaded, performs only disk and stdout I/O, nothing fancy. Compiled with <code>-finstrument-functions</code>, no optimizations, and with debugging symbols.</p>
<p>Session setup:<br /><pre>
# lttng create
# lttng enable-channel -u --subbuf-size=64M --num-subbuf=8 chan_ust
# lttng enable-channel -k --subbuf-size=64M --num-subbuf=2 chan_kernel
# lttng add-context -u -t vpid -t ip -t procname -t vtid -c chan_ust
# lttng enable-event -u -a -c chan_ust
# lttng enable-event -k -c chan_kernel --syscall --all
# lttng enable-event -k -c chan_kernel lttng_statedump_block_device,lttng_statedump_file_descriptor,lttng_statedump_process_state,mm_page_alloc,mm_page_free,net_dev_xmit,netif_receive_skb,sched_pi_setprio,sched_process_exec,sched_process_fork,sched_switch,sched_wakeup,sched_waking,softirq_entry,softirq_exit,softirq_raise
# lttng start
(possibly excessive preloads, but these are from my notes, collecting data on Varnish Cache)
# LTTNG_UST_BLOCKING_RETRY_TIMEOUT=-1 LD_PRELOAD=liblttng-ust-fd.so:liblttng-ust-fork.so:liblttng-ust-dl.so:liblttng-ust-cyg-profile.so:liblttng-ust-libc-wrapper.so:liblttng-ust-pthread-wrapper.so tests/test-playlist tests/playlist/bug_ffmpeg_1000-x20.m3u
# lttng stop
# lttng view > /dev/null
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
</pre></p>
Environment:
<ul>
<li>lttng-modules-2.13.10 (from tarball)</li>
<li>lttng-tools-2.13.11 (from tarball)</li>
<li>lttng-ust-2.13.6 (from tarball)</li>
<li>babeltrace (git; at <code>v1.5.11</code>)</li>
<li>Linux kernel 6.6.0-rc1</li>
<li>Slackware64-current, up-to-date as of 2023-09-26</li>
<li>Trace Compass 9.1.0.202309200833</li>
</ul>
<p>IRC snippet:<br /><pre>
Sep/27 16:13:54 rnsanchez I'm trying to find out what I'm doing wrong.. I keep getting "[error] Stream 0 is not declared in metadata." in my captures. I guess this is making tracecompass not being able to show UST events (simple function enter-exit instrumentation)
Sep/27 16:16:46 rnsanchez using my old notes, while checking https://lttng.org/docs/v2.13/#doc-liblttng-ust-cyg-profile and other sections
Sep/27 16:18:25 jgalar rnsanchez: shot in the dark, but are you making sure the session is stopped or destroyed before reading the trace?
Sep/27 16:18:28 rnsanchez if it helps knowing, events from kernel channel are fine and I can see them in tracecompass (the trace is huge)
Sep/27 16:18:52 rnsanchez jgalar: yes, with lttng stop, then I copy the data from root to my user
Sep/27 16:19:10 rnsanchez but I get that Stream 0 regardless.. it's from lttng view
Sep/27 16:20:11 jgalar it sounds like a metadata flushing issue, any chance you can share the trace? (or at least, the metadata files)
Sep/27 16:20:42 rnsanchez yes, sure. just a moment while I upload it
Sep/27 16:20:48 jgalar okay thanks
Sep/27 16:21:43 ebugden not sure if related, but it's triggering deja vu of this issue: https://github.com/efficios/barectf/commit/d024537859c1d869bfa1cedc8abe8e3f7a648faa
Sep/27 16:22:23 ebugden maybe a similar problem in a similar area in lttng view
Sep/27 16:25:28 rnsanchez jgalar: https://rnsanchez.wait4.org/auto-20230927-161138.tar.xz
Sep/27 16:27:33 rnsanchez jgalar: this last one has incomplete events as I was trying to investigate, but I can bundle another one with full events as I used to collect
Sep/27 16:31:51 rnsanchez ebugden: is there anything on my side I can do?
Sep/27 16:34:38 ebugden rnsanchez: i don't feel i'm familiar enough with this situation to say; i was tossing out the association as possible food for thought for jgalar
Sep/27 16:35:12 rnsanchez oh ok :-)
Sep/27 16:43:03 rnsanchez jgalar: just in case, here is with full events: https://rnsanchez.wait4.org/auto-20230927-164014.tar.xz
Sep/27 16:45:53 jgalar ebugden: yup, the errors are indeed similar
Sep/27 16:50:59 jgalar hmm, the user space trace's metadata doesn't declare any stream class, that's... unexpected...
Sep/27 16:51:14 jgalar which versions of lttng-ust and lttng-tools are you running?
Sep/27 16:55:03 rnsanchez ust 2.13.6 (tarball) and tools 2.13.11 (tarball too)
Sep/27 16:55:07 rnsanchez compiled today
Sep/27 16:55:13 jgalar okay
Sep/27 17:17:57 ebugden rnsanchez (cc: jgalar): would you open a bug report on https://bugs.lttng.org/ ?
Sep/27 17:18:16 rnsanchez ebugden: sure
Sep/27 17:19:21 ebugden thanks! (we have a few hypotheses, but we'd need more information about exactly how the trace is generated)
Sep/27 17:19:58 jgalar rnsanchez: it's weird, the user space trace's stream files just have the packet headers... if you can reproduce the problem while running lttng-sessiond with the `-vvv --verbose-consumer` options to get the logs, it would be helpful
Sep/27 17:20:47 rnsanchez I can try. I suppose I have to finish the sessiond already running and manually launch another?
Sep/27 17:21:06 jgalar yep
</pre></p> LTTng - Bug #1385 (Feedback): [lttng-live] error: recevie a message in the pasthttps://bugs.lttng.org/issues/13852023-08-16T18:53:52ZBin Yuan
<p>My program works well most of time. But in rare case, it reports the following hightling error.<br /><img src="https://bugs.lttng.org/attachments/download/572/clipboard-202308170250-vfley.jpg" alt="" loading="lazy" /></p>
<p>What will lead to those branch? any suggestion to debug?</p> Babeltrace - Bug #1384 (New): Assertion `bt_value_get_type(map) == BT_VALUE_TYPE_MAP` failshttps://bugs.lttng.org/issues/13842023-08-08T18:07:42ZErica Bugden
<p>Hello! I triggered an internal assertion while trying to, presumably incorrectly, query the lttng-live component class.</p>
<a name="Software"></a>
<h1 >Software<a href="#Software" class="wiki-anchor">¶</a></h1>
<ul>
<li>lttng tools: <code>lttng (LTTng Trace Control) 2.14.0-pre - O-Beer - v2.12.0-rc1-1528-g5983bb6d3</code></li>
<li>babeltrace2: <code>Babeltrace 2.1.0-rc1 "Codename TBD" [v1.2.0-3704-gdbea6be2]</code> (configured with plugins, python bindings support)</li>
</ul>
<a name="Procedure"></a>
<h1 >Procedure<a href="#Procedure" class="wiki-anchor">¶</a></h1>
<ul>
<li>Start root session daemon: <code>$sudo lttng-sessiond --daemonize</code></li>
<li>Create live session: <code>$lttng create my-session --live</code></li>
<li>Enable events: <code>$lttng enable-event --kernel sched-switch,sched_process_fork</code></li>
<li>Start tracing <code>$lttng start</code></li>
<li>Run python script: <code>python3 lttng-live.py</code> (see below)
<ul>
<li>Assertion</li>
</ul></li>
</ul>
<a name="Python-script"></a>
<h1 >Python script<a href="#Python-script" class="wiki-anchor">¶</a></h1>
<p>Note: The script is noisy as some code/comments are irrelevant/incorrect/inconsistent. (It's originally based on the query example in the Python bindings documentation.)</p>
<pre>
import bt2
import sys
# Get the `source.ctf.fs` component class from the `ctf` plugin.
# `source.ctf.lttng-live` component class instead
# (list plugins/components with $babeltrace2 list-plugins)
comp_cls = bt2.find_plugin('ctf').source_component_classes['lttng-live']
# The `babeltrace.support-info` query operation expects a `type`
# parameter (set to `directory` here) and an `input` parameter (the
# actual path or string to check, in this case the first command-line
# argument).
#
# See `babeltrace2-query-babeltrace.support-info(7)`.
params = {
'url': 'net://localhost/host/luna/my-session'
}
'''
params = {
'type': 'string',
'input': sys.argv[1],
}
'''
# Create a query executor.
#
# This is the environment in which query operations happens. The
# queried component class has access to this executor, for example to
# retrieve the query operation's logging level.
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
'inputs=net://localhost/host/luna/my-session')
'''
query_exec = bt2.QueryExecutor(comp_cls, 'babeltrace.support-info',
params)
'''
# Query the component class through the query executor.
#
# This method returns the result.
result = query_exec.query()
# Print the result.
print(result)
# Try to iterate on the trace.
'''
for msg in bt2.TraceCollectionMessageIterator('net://localhost/host/luna/my-session'):
if type(msg) is bt2._EventMessageConst:
print(msg.event.name)
'''
</pre>
<p>If I remember correctly, <strong>the script line change that triggered the assertion was expressing the query parameters directly as a string</strong> (rather than a dictionary). As in this:</p>
<pre>
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
'inputs=net://localhost/host/luna/my-session')
</pre>
<p>instead of this:</p>
<pre>
params = {
'url': 'net://localhost/host/luna/my-session'
}
query_exec = bt2.QueryExecutor(comp_cls, 'sessions',
params)
</pre>
<a name="Output"></a>
<h1 >Output<a href="#Output" class="wiki-anchor">¶</a></h1>
<pre>
erica@luna:~$ python3 lttng-live.py
(╯°□°)╯︵ ┻━┻ param-validation.c:196: validate_map_value(): Assertion `bt_value_get_type(map) == BT_VALUE_TYPE_MAP` failed.
Aborted (core dumped)
</pre> LTTng-tools - Bug #1383 (New): The "cpu_id" context (available for filters) is not discoverable b...https://bugs.lttng.org/issues/13832023-07-23T16:22:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>I recently tried remembering how to filter by cpu_id when enabling an event, and found that there was not clear way to see that $ctx.cpu_id actually exists from the lttng man pages (other than a random example in the lttng-enable-event man page). AFAIU the man pages rely on lttng add-context --list to let the user discover the available contexts. This is likely because the cpu_id context is not available for the "add-context" command because it would be redundant with the implicit context already sampled with the buffers.</p>
<p>We may want to have some way to let users discovers those contexts which are filter-specific.</p>
<p>My use-case is to try to grab only traces for a few logical cpus (0-3) from my 384 logical cpus machine.</p> LTTng-tools - Bug #1381 (New): lttng list-triggers crashing with 2.13.9https://bugs.lttng.org/issues/13812023-06-16T19:23:50ZHeman Muresan
<p>This was discussed offline, but I am opening a ticket here for tracking purposes. With the latest 2.13.9 I'm seeing the following assert:<br /><pre>
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007fb11ad09723 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007fb11acbe876 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007fb11aca97e3 in __GI_abort () at abort.c:79
#4 0x00007fb11aca9719 in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:92
#5 0x00007fb11acb78e6 in __GI___assert_fail (assertion=assertion@entry=0x475fd2 "element_index < array->size", file=file@entry=0x4832e8 "../../../lttng-tools-2.13.9/src/common/dynamic-array.h",
line=line@entry=55, function=function@entry=0x498120 <__PRETTY_FUNCTION__.1> "lttng_dynamic_array_get_element") at assert.c:101
#6 0x0000000000469311 in lttng_dynamic_array_get_element (array=0x4b7f10, element_index=0) at ../../../lttng-tools-2.13.9/src/common/dynamic-array.h:55
#7 lttng_dynamic_array_get_element (element_index=0, array=0x4b7f10) at ../../../lttng-tools-2.13.9/src/common/dynamic-array.h:52
#8 lttng_action_path_copy (src=src@entry=0x4b7f10, dst=dst@entry=0x4b8250) at ../../../lttng-tools-2.13.9/src/common/actions/path.c:109
#9 0x000000000043595c in lttng_error_query_action_create (action_path=0x4b7f10, trigger=0x4b8240) at ../../../lttng-tools-2.13.9/src/common/error-query.c:230
#10 lttng_error_query_action_create (trigger=trigger@entry=0x4b5ea0, action_path=action_path@entry=0x4b7f10) at ../../../lttng-tools-2.13.9/src/common/error-query.c:206
#11 0x000000000042c62a in print_action_errors (action=<optimized out>, action_path_length=0, action_path_indexes=0x0, trigger=0x4b5ea0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:766
#12 print_one_action (trigger=0x4b5ea0, action=<optimized out>, action_path_indexes=0x0, action_path_length=0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:966
#13 0x000000000042d105 in print_one_trigger (trigger=0x4b5ea0) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1124
#14 print_sorted_triggers (triggers=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1202
#15 cmd_list_triggers (argc=<optimized out>, argv=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/commands/list_triggers.c:1421
#16 0x0000000000410be0 in handle_command (argv=0x7ffc348fc9c0, argc=1) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:237
#17 parse_args (argv=0x7ffc348fc9b8, argc=<optimized out>) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:426
#18 main (argc=<optimized out>, argv=0x7ffc348fc9b8) at ../../../../lttng-tools-2.13.9/src/bin/lttng/lttng.c:475
</pre></p>
<p>There's nothing particular fancy about the trigger being printed, it's just a buffer usage trigger. Here is it being displayed on an older version of lttng-tools:<br /><pre>
- name: foo_bar_bufferUsageTrigger
owner uid: 0
condition: buffer usage high
session name: foo_bar
channel name: foo_bar-ust
domain: ust
threshold (ratio): 0.70
errors: none
action:notify
errors: none
errors: none
</pre></p>
<p>This trigger is being created with the following pseudocode:<br /><pre>
pCondition = lttng_condition_buffer_usage_high_create();
lttng_condition_buffer_usage_set_session_name(...);
lttng_condition_buffer_usage_set_channel_name(...);
lttng_condition_buffer_usage_set_threshold_ratio(...);
lttng_condition_buffer_usage_set_domain_type();
pAction = lttng_action_notify_create();
pTrigger = lttng_trigger_create(pCondition, pAction);
lttng_register_trigger_with_name(pTrigger, pTriggerName);
</pre></p>
<p>I think this was introduced by this commit:<br /><a class="external" href="https://github.com/lttng/lttng-tools/commit/1ad48d9678f8f123dc7d44dbbfadc9957b7e970e#diff-e9fa6863d3580e2d95137ddaab0280d5391e830657a46f47cde39e0cfabc8fb7R109">https://github.com/lttng/lttng-tools/commit/1ad48d9678f8f123dc7d44dbbfadc9957b7e970e#diff-e9fa6863d3580e2d95137ddaab0280d5391e830657a46f47cde39e0cfabc8fb7R109</a><br />In lttng_action_path_copy the code as is doesn't handle the case of there being no destination action path. A way to fix this is to make this code handle this scenario, attached is a patch that does this. But, perhaps we shouldn't even be hitting this scenario where there is no action path and this is just a bandaid.</p>
<p>Thanks,<br />Heman</p> LTTng-tools - Bug #1379 (New): Document behavior of live mode with per-pid UST buffershttps://bugs.lttng.org/issues/13792023-06-01T18:48:58ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>There is a design limitation of per-pid UST buffers when used in live mode which should be documented.</p>
<p>Basically, because the per-pid buffers are reclaimed soon after the application exits, it may cause the events traced by a short-lived application to never appear in the output of a live trace when per-pid buffers are used.</p>
<p>This is caused by the fact that the live client periodically checks for new buffers, but there is no inherent reference kept on the streams before they disappear.</p>
<p>This means that the information will be available in the disk output of the trace, but it will be missing from the live mode output.</p>
<p>This could eventually be mitigated by keeping references to the streams received by the relay daemon which are of interest to a live client until they are referenced by the live client.</p> LTTng-tools - Bug #1378 (New): Document behavior of snapshot mode with per-pid UST buffershttps://bugs.lttng.org/issues/13782023-06-01T18:01:12ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>We should document a design limitation of the per-pid UST buffers which can lead to unexpected results when used with the "snapshot" feature.</p>
<p>Basically, because the per-pid buffers lifetime is bound by the application using them (and the consumer daemon extracting data from them), they are freed almost immediately after the traced application exits. However, in a scenario where we are interested in capturing a snapshot of the content of flight recorder buffers soon after application exits (e.g. caused by a crash), those buffers are not available anymore a few milliseconds after the application exits.</p>
<p>We should document this design limitation, and eventually think about improvements (feature request ?) to allow a configurable delay after application exit during which the consumer daemon could keep references on the buffers so they are available for snapshot.</p> LTTng-tools - Bug #1372 (In Progress): Consumer crashes during rotation (write to bad file descri...https://bugs.lttng.org/issues/13722023-04-25T11:44:50ZMaximilian Fickert
<p>Hi,</p>
<p>We are observing seemingly random crashes in the LTTng consumer daemon when tracing a C++ application with LTTng-UST. Our workload has a single <code>printf</code>-like tracepoint, where each string is in the order of 1kb and the total output is around 30MB/s. LTTng is set up with a single session and channel enabling this tracepoint, and we enabled rotation with a maximum size of 100MB or every 30 seconds. We are periodically starting new traced processes and the system runs close to 100% CPU load. This ran on an AWS Graviton2 (ARM) instance with CentOS 7 and a 5.4 kernel, using LTTng-UST 2.13.5 and LTTng-tools 2.13.8.</p>
<p>The first reported error is a write to a bad file descriptor (-1), apparently when waking up the metadata poll thread during a rotation.</p>
<p>I looked through the LTTng logs with <code>-vvv --verbose-consumer</code> and I suspect this might be caused by some sort of race condition; looking at the following log fragments (I inserted two additional debug lines in LTTng so the line numbers may not exactly match those in the 2.13.8 source code):<br /><pre>
DBG1 - 15:12:12.865621802 [Rotation]: Consumer rotate channel key 574 (in consumer_rotate_channel() at consumer.c:1694)
</pre><br />A rotation is requested for the channel with key 574. Then the metadata pipe of this channel is closed:<br /><pre>
DBG3 - 15:12:13.269805401 [UST application management]: Buffer registry per PID find id: 288 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.269811366 [UST application management]: No metadata to push for metadata key 574 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.269816979 [UST application management]: Consumer close metadata channel key 574 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.269826899 [6593/6605]: Incoming command on sock (in consumer_thread_sessiond_poll() at consumer.c:3283)
DBG1 - 15:12:13.269835259 [6593/6605]: UST consumer close metadata key 574 (in close_metadata() at ust-consumer.c:804)
DBG1 - 15:12:13.269838591 [6593/6605]: Closing metadata channel key 574 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3235)
DBG3 - 15:12:13.269841167 [6593/6605]: close() fd = 603 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3250)
DBG1 - 15:12:13.269847075 [6593/6605]: Received command on sock (in consumer_thread_sessiond_poll() at consumer.c:3299)
</pre><br />But then LTTng unsuccessfully attempts to wake up the metadata poll thread by writing to the file descriptor that was just closed (the <code>channel key = 574</code> in the metadata poll thread message below is <code>channel->key</code>):<br /><pre>
DBG1 - 15:12:13.271001175 [6593/6605]: Waking up metadata poll thread (writing to pipe): channel name = 'metadata', channel key = 574 (in consumer_metadata_wakeup_pipe() at consumer.c:888)
DBG3 - 15:12:13.271010093 [6593/6605]: write() fd = -1 (in consumer_metadata_wakeup_pipe() at consumer.c:892)
PERROR - 15:12:13.271014655 [6593/6605]: Failed to write to UST metadata pipe while attempting to wake-up the metadata poll thread: Bad file descriptor (in consumer_metadata_wakeup_pipe() at consumer.c:907)
Error: Failed to dump the metadata cache
Error: Rotate channel failed
</pre><br />After this, there are a bunch more errors in the logs, but I guess they are ultimately caused by the issue above:<br /><pre>
DBG1 - 15:12:13.273572566 [6593/6604]: Consumer mmap write() ret 446464 (len 446464) (in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1726)
lttng-consumerd: consumer.c:1639: lttng_consumer_on_read_subbuffer_mmap: Assertion `stream->net_seq_idx != (uint64_t) -1ULL || stream->trace_chunk' failed.
...
DBG1 - 15:12:13.475559588 [Consumer management]: Error when receiving data from the consumer socket 61 (in consumer_socket_recv() at consumer.c:159)
Error: Handling metadata request
DBG1 - 15:12:13.476382551 [Rotation]: Error when receiving data from the consumer socket 60 (in consumer_socket_recv() at consumer.c:159)
Error: Error pushing metadata to consumer
DBG1 - 15:12:13.476418727 [Rotation]: Consumer rotate channel key 604 (in consumer_rotate_channel() at consumer.c:1694)
DBG1 - 15:12:13.476424962 [Rotation]: Setting trace chunk close command to "move to completed chunk folder" (in lttng_trace_chunk_set_close_command() at trace-chunk.c:1797)
DBG1 - 15:12:13.476428671 [Rotation]: lttng_trace_chunk_rename_path from .tmp_new_chunk to (null) (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.c:755)
DBG3 - 15:12:13.476437516 [Rotation]: renameat() old_dirfd = 946, old_name = .tmp_new_chunk, new_dirfd = 946, new_name = 20230421T151158+0000-5, uid = 1004, gid = 1004 (in run_as_renameat() at runas.c:1883)
DBG1 - 15:12:13.476444999 [Rotation]: Using run_as worker (in run_as() at runas.c:1646)
Error: Error pushing metadata to consumer
DBG2 - 15:12:13.476569797 [UST application management]: Consumer close metadata channel key 624 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476585469 [UST application management]: PID 7729 unregistering with sock 1060 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476589957 [UST application management]: Flushing app session buffers for ust app pid 7729 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476611364 [UST application management]: Consumer flush channel key 651 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476616188 [UST application management]: Buffer registry per PID find id: 327 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.476620406 [UST application management]: No metadata to push for metadata key 652 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.476623277 [UST application management]: Consumer close metadata channel key 652 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476633312 [UST application management]: PID 11467 unregistering with sock 457 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476636996 [UST application management]: Flushing app session buffers for ust app pid 11467 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476640508 [UST application management]: Consumer flush channel key 661 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476643946 [UST application management]: Buffer registry per PID find id: 332 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG2 - 15:12:13.476649501 [UST application management]: Consumer push metadata to consumer socket -1 (in consumer_push_metadata() at consumer.c:1462)
Error: Error pushing metadata to consumer
...
</pre></p>
<p>We currently do not have simple steps to reliably reproduce the issue, but this happens somewhat irregularly after 5-10 minutes (sometimes earlier, sometimes later) of running our workload. I attached a longer fragment of the logs around the time the error happens, let us know if we can provide any more information to help.</p> LTTng-tools - Bug #1371 (New): Cannot have two non-root session daemons in two separate Docker co...https://bugs.lttng.org/issues/13712023-04-18T23:33:11ZChristophe Bedard
<p>I have 2 separate Docker containers. The user in each container is the same user as my non-root host user (same name, same UID, same GID). Users in the 2 Docker containers have the same $HOME, which is a directory mounted from the host.</p>
<p>To set up and configure each container:</p>
<p>1. Start container and pass all necessary info through:</p>
<pre>
host$ mkdir ~/fake-home
host$ docker run -it --env USER=$USER --env GROUP=$USER --env USER_ID=$(id -u) --env GROUP_ID=$(id -g) -v $HOME/fake-home:$HOME ubuntu:22.04 bash
</pre>
<p>2. Inside each container, create user, install lttng-tools, and start shell as $USER:</p>
<pre>
containerN$ groupdel "$GROUP" &>/dev/null || true
containerN$ groupadd -og "$GROUP_ID" "$GROUP"
containerN$ useradd -M -u "$USER_ID" -g "$GROUP_ID" -d "/home/$USER" -s /bin/bash "$USER"
containerN$ usermod -aG sudo "$USER"
containerN$ apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y lttng-tools
containerN$ su - "$USER"
</pre>
<p>Then, I can start a non-root session daemon in container 1 and trace just fine:</p>
<pre>
container1$ lttng list # No session daemon, as expected
Error: No session daemon is available
container1$ lttng-sessiond --daemonize
container1$ pgrep lttng-sessiond # As expected, we have a session daemon
459
container1$ cat $HOME/.lttng/lttng-sessiond.pid # Matches above output
459
container1$ ps -o comm= $(cat $HOME/.lttng/lttng-sessiond.pid) # That PID indeed corresponds to the session daemon
lttng-sessiond
container1$ lttng list # Now this works, as expected
Currently no available recording session
container1$ # Trace, ..., works fine
container1$ babeltrace /home/$USER/lttng-traces/trace-container1 # Shows trace data
...
</pre>
<p>However, in container 2, LTTng thinks that there is already a session daemon, but tracing doesn't collect any data:</p>
<pre>
container2$ pgrep lttng-sessiond # No output, since there is no session daemon, as expected
container2$ lttng-sessiond --daemonize # Doesn't work, which is unexpected, since there isn't supposed to be a session daemon!
Error: A session daemon is already running.
container2$ cat $HOME/.lttng/lttng-sessiond.pid # Matches output from container 1, since they have the same $HOME directory
459
container1$ ps -o comm= 459 # ...but that PID doesn't correspond to any process in this container! (no output)
container2$ lttng list # Works, which is unexpected, since there isn't supposed to be a session daemon!
Currently no available recording session
container2$ # Trace, ..., works fine
container2$ babeltrace /home/$USER/lttng-traces/trace-container2 # Error, babeltrace says it's not a trace, which is unexpected
[error] Cannot open any trace for reading.
[error] opening trace "/home/$USER/lttng-traces/trace-container2" for reading.
[error] none of the specified trace paths could be opened.
container2$ ls /home/$USER/lttng-traces/trace-container2 # The trace directory has a ust/ subdirectory, but it's empty
ust
container2$ export LTTNG_HOME=~/some-other-dir # Setting/using a different LTTNG_HOME fixes everything
</pre>
<p>As mentioned in the last line above, using a different LTTNG_HOME allows container2 to successfully start its own non-root session daemon and collect trace data. However, the issue is that this fails silently: LTTng doesn't detect this weird configuration/state and doesn't give an error.</p>
<p>I can detect this myself by checking if the PID in the $HOME/.lttng/lttng-sessiond.pid file is a 'lttng-sessiond' process:</p>
<pre>
[ $(ps -o comm= $(cat $HOME/.lttng/lttng-sessiond.pid)) = 'lttng-sessiond' ]
</pre>
<p>If not, then that means that the session daemon isn't really "available," and I can report an error in my own tool and tell my user to set LTTNG_HOME to something other than $HOME and try again. I understand that this is a weird corner case, but it would be good if LTTng itself could handle this check, if possible.</p>
<p>Note: in my case, the $HOME directory inside the containers is not the actual host user's $HOME directory. However, if it were, maybe this could be reproducible with the host + 1 Docker container?</p> LTTng-UST - Bug #1132 (New): Fails to build with Crosscompilerhttps://bugs.lttng.org/issues/11322017-10-27T11:46:39ZNorbert Lange
<p>The Userspace libary will fail to build when crosscompiling, this is easily repruducable with Buildroot for example.</p>
<p>As far as I can tell, the transitive dependency to liblttng-ust-tracepoint.so.0 when referencing liblttng-ust.so is lost, <br />I suppose it would not appear if the host or sysroot contains an installed liblttng-ust-tracepoint.so.0.</p>
<p>Error Message is following:</p>
<p>[ 70%] Linking CXX executable tester<br />/usr/bin/cmake -E cmake_link_script CMakeFiles/tester.dir/link.txt --verbose=1<br />/tmp/build/host/bin/x86_64-buildroot-linux-gnu-g++ -g -O2 -rdynamic CMakeFiles/tester.dir/tester.cpp.o -o tester -Wl,-rpath,/tmp/a/doc/examples/cmake-multiple-shared-libraries/build:/tmp/a/liblttng-ust/.libs libaligner-lib.so libtester-lib.so libtracepoint-provider.so /tmp/a/liblttng-ust/.libs/liblttng-ust.so -ldl <br />/tmp/build/host/lib/gcc/x86_64-buildroot-linux-gnu/7.2.0/../../../../x86_64-buildroot-linux-gnu/bin/ld: warning: liblttng-ust-tracepoint.so.0, needed by /tmp/a/liblttng-ust/.libs/liblttng-ust.so, not found (try using -rpath or -rpath-link)<br />/tmp/a/liblttng-ust/.libs/liblttng-ust.so: undefined reference to `exit_tracepoint'<br />/tmp/a/liblttng-ust/.libs/liblttng-ust.so: undefined reference to `__tracepoint_probe_unregister_queue_release'<br />/tmp/a/liblttng-ust/.libs/liblttng-ust.so: undefined reference to `__tracepoint_probe_register_queue_release'<br />/tmp/a/liblttng-ust/.libs/liblttng-ust.so: undefined reference to `__tracepoint_probe_prune_release_queue'<br />/tmp/a/liblttng-ust/.libs/liblttng-ust.so: undefined reference to `init_tracepoint'<br />collect2: error: ld returned 1 exit status</p>
<p>This issue is atleast occuring in 2.9.0, 2.9.1 and 2.10.0</p>