LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912023-04-25T11:44:50ZLTTng bugs repository
Redmine LTTng-tools - Bug #1372 (In Progress): Consumer crashes during rotation (write to bad file descri...https://bugs.lttng.org/issues/13722023-04-25T11:44:50ZMaximilian Fickert
<p>Hi,</p>
<p>We are observing seemingly random crashes in the LTTng consumer daemon when tracing a C++ application with LTTng-UST. Our workload has a single <code>printf</code>-like tracepoint, where each string is in the order of 1kb and the total output is around 30MB/s. LTTng is set up with a single session and channel enabling this tracepoint, and we enabled rotation with a maximum size of 100MB or every 30 seconds. We are periodically starting new traced processes and the system runs close to 100% CPU load. This ran on an AWS Graviton2 (ARM) instance with CentOS 7 and a 5.4 kernel, using LTTng-UST 2.13.5 and LTTng-tools 2.13.8.</p>
<p>The first reported error is a write to a bad file descriptor (-1), apparently when waking up the metadata poll thread during a rotation.</p>
<p>I looked through the LTTng logs with <code>-vvv --verbose-consumer</code> and I suspect this might be caused by some sort of race condition; looking at the following log fragments (I inserted two additional debug lines in LTTng so the line numbers may not exactly match those in the 2.13.8 source code):<br /><pre>
DBG1 - 15:12:12.865621802 [Rotation]: Consumer rotate channel key 574 (in consumer_rotate_channel() at consumer.c:1694)
</pre><br />A rotation is requested for the channel with key 574. Then the metadata pipe of this channel is closed:<br /><pre>
DBG3 - 15:12:13.269805401 [UST application management]: Buffer registry per PID find id: 288 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.269811366 [UST application management]: No metadata to push for metadata key 574 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.269816979 [UST application management]: Consumer close metadata channel key 574 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.269826899 [6593/6605]: Incoming command on sock (in consumer_thread_sessiond_poll() at consumer.c:3283)
DBG1 - 15:12:13.269835259 [6593/6605]: UST consumer close metadata key 574 (in close_metadata() at ust-consumer.c:804)
DBG1 - 15:12:13.269838591 [6593/6605]: Closing metadata channel key 574 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3235)
DBG3 - 15:12:13.269841167 [6593/6605]: close() fd = 603 (in lttng_ustconsumer_close_metadata() at ust-consumer.c:3250)
DBG1 - 15:12:13.269847075 [6593/6605]: Received command on sock (in consumer_thread_sessiond_poll() at consumer.c:3299)
</pre><br />But then LTTng unsuccessfully attempts to wake up the metadata poll thread by writing to the file descriptor that was just closed (the <code>channel key = 574</code> in the metadata poll thread message below is <code>channel->key</code>):<br /><pre>
DBG1 - 15:12:13.271001175 [6593/6605]: Waking up metadata poll thread (writing to pipe): channel name = 'metadata', channel key = 574 (in consumer_metadata_wakeup_pipe() at consumer.c:888)
DBG3 - 15:12:13.271010093 [6593/6605]: write() fd = -1 (in consumer_metadata_wakeup_pipe() at consumer.c:892)
PERROR - 15:12:13.271014655 [6593/6605]: Failed to write to UST metadata pipe while attempting to wake-up the metadata poll thread: Bad file descriptor (in consumer_metadata_wakeup_pipe() at consumer.c:907)
Error: Failed to dump the metadata cache
Error: Rotate channel failed
</pre><br />After this, there are a bunch more errors in the logs, but I guess they are ultimately caused by the issue above:<br /><pre>
DBG1 - 15:12:13.273572566 [6593/6604]: Consumer mmap write() ret 446464 (len 446464) (in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1726)
lttng-consumerd: consumer.c:1639: lttng_consumer_on_read_subbuffer_mmap: Assertion `stream->net_seq_idx != (uint64_t) -1ULL || stream->trace_chunk' failed.
...
DBG1 - 15:12:13.475559588 [Consumer management]: Error when receiving data from the consumer socket 61 (in consumer_socket_recv() at consumer.c:159)
Error: Handling metadata request
DBG1 - 15:12:13.476382551 [Rotation]: Error when receiving data from the consumer socket 60 (in consumer_socket_recv() at consumer.c:159)
Error: Error pushing metadata to consumer
DBG1 - 15:12:13.476418727 [Rotation]: Consumer rotate channel key 604 (in consumer_rotate_channel() at consumer.c:1694)
DBG1 - 15:12:13.476424962 [Rotation]: Setting trace chunk close command to "move to completed chunk folder" (in lttng_trace_chunk_set_close_command() at trace-chunk.c:1797)
DBG1 - 15:12:13.476428671 [Rotation]: lttng_trace_chunk_rename_path from .tmp_new_chunk to (null) (in lttng_trace_chunk_rename_path_no_lock() at trace-chunk.c:755)
DBG3 - 15:12:13.476437516 [Rotation]: renameat() old_dirfd = 946, old_name = .tmp_new_chunk, new_dirfd = 946, new_name = 20230421T151158+0000-5, uid = 1004, gid = 1004 (in run_as_renameat() at runas.c:1883)
DBG1 - 15:12:13.476444999 [Rotation]: Using run_as worker (in run_as() at runas.c:1646)
Error: Error pushing metadata to consumer
DBG2 - 15:12:13.476569797 [UST application management]: Consumer close metadata channel key 624 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476585469 [UST application management]: PID 7729 unregistering with sock 1060 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476589957 [UST application management]: Flushing app session buffers for ust app pid 7729 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476611364 [UST application management]: Consumer flush channel key 651 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476616188 [UST application management]: Buffer registry per PID find id: 327 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG3 - 15:12:13.476620406 [UST application management]: No metadata to push for metadata key 652 (in ust_app_push_metadata() at ust-app.c:689)
DBG2 - 15:12:13.476623277 [UST application management]: Consumer close metadata channel key 652 (in consumer_close_metadata() at consumer.c:1394)
DBG1 - 15:12:13.476633312 [UST application management]: PID 11467 unregistering with sock 457 (in ust_app_unregister() at ust-app.c:4253)
DBG1 - 15:12:13.476636996 [UST application management]: Flushing app session buffers for ust app pid 11467 (in ust_app_flush_app_session() at ust-app.c:5315)
DBG2 - 15:12:13.476640508 [UST application management]: Consumer flush channel key 661 (in consumer_flush_channel() at consumer.c:1328)
Error: Error flushing consumer channel
DBG3 - 15:12:13.476643946 [UST application management]: Buffer registry per PID find id: 332 (in buffer_reg_pid_find() at buffer-registry.c:308)
DBG2 - 15:12:13.476649501 [UST application management]: Consumer push metadata to consumer socket -1 (in consumer_push_metadata() at consumer.c:1462)
Error: Error pushing metadata to consumer
...
</pre></p>
<p>We currently do not have simple steps to reliably reproduce the issue, but this happens somewhat irregularly after 5-10 minutes (sometimes earlier, sometimes later) of running our workload. I attached a longer fragment of the logs around the time the error happens, let us know if we can provide any more information to help.</p> LTTng - Bug #1370 (Confirmed): Why "lttng create --live" spawns a local relay daemon but not in ...https://bugs.lttng.org/issues/13702023-04-06T15:33:05ZBin Yuan
<p>The relayd spawned by lttng-create command don't close the file descriptor such like stdout.<br />Why not spawn the relayd with "--daemonize" option.</p> Babeltrace - Bug #1299 (Confirmed): babeltrace2 re-uses prior event string rather than expected e...https://bugs.lttng.org/issues/12992021-03-04T18:49:50ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>Trying to reproduce the page fault scenario for an openat() event. Note that the behavior of generating a page fault may be toolchain-specific (linker-specific):</p>
<p>Test program "do-pagefault-open":</p>
<p>#include <stdio.h><br />#include <sys/types.h><br />#include <sys/stat.h><br />#include <fcntl.h></p>
<p>const char *str = "/tmp/blah";<br />const char *str2 = "/tmp/blah2";</p>
<p>int main()
{<br /> open(str, O_RDWR);<br /> perror("first");<br /> open(str2, O_RDWR);<br /> perror("second");<br /> return 0;<br />}</p>
<p>then trace with:</p>
<p>lttng-trace ./do-pagefault-open</p>
<p>Relevant events with babeltrace2 (grepping for 'openat'):</p>
<p>[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }<br />[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296960451] (+0.000006223) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }</p>
<p>Same trace with babeltrace 1:</p>
<p>[warning] Unknown value 0 in enum.<br />[warning] Unknown value 0 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 5 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 129 in enum.<br />[warning] Unknown value 3 in enum.<br />[warning] Unknown value 3 in enum.<br />[13:47:12.296015570] (+0.000015421) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/etc/ld.so.cache", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296029004] (+0.000013434) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[13:47:12.296116608] (+0.000058899) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/lib/x86_64-linux-gnu/libc.so.6", flags = ( "O_CLOEXEC" : container = 524288 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296126244] (+0.000009636) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = 3 }<br />[warning] Unknown value 0 in enum.<br />[warning] Unknown value 0 in enum.<br />[13:47:12.296706313] (+0.000096888) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }<br />[13:47:12.296717797] (+0.000011484) thinkos syscall_exit_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { ret = -2 }<br />[13:47:12.296954228] (+0.000005481) thinkos syscall_entry_openat: { cpu_id = 3 }, { procname = "do-pagefault-op", vpid = 31576, vtid = 31576 }, { dfd = -100, filename = "/tmp/blah2", flags = ( "O_RDWR" : container = 2 ), mode = ( <unknown> : container = 0 ) }</p>
<p>Notice how filename = "/lib/x86_64-linux-gnu/libc.so.6" appears twice in the bt2 output, where we would expect the empty string (as provided by bt1).</p>
<p>Tested with babeltrace2 commit c429f86d0.</p> LTTng - Bug #1298 (On pause): LTTng build reprudibility for OE-core (yocto)https://bugs.lttng.org/issues/12982021-03-01T14:52:11ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>Currently the installable part of LTTng are "reproducible" for the Debian packages.</p>
<p>Still, oe-core are having issues since they seems to be also shipping the tests for ptest.</p>
<p>See <a class="external" href="https://autobuilder.yocto.io/pub/repro-fail/2021-2-25-rp/lttng-tools/diff/">https://autobuilder.yocto.io/pub/repro-fail/2021-2-25-rp/lttng-tools/diff/</a></p>
<p>As of today this is not a high priority for us. This will be revisited after 2.13 is released.</p> Babeltrace - Bug #1277 (Confirmed): The `ctf` plugin does not support a negative TSDL clock class...https://bugs.lttng.org/issues/12772020-07-23T14:08:36ZSeongab Kim
<p>Hi,</p>
<p>I have a trace which cannot be opened by babeltrace as below, but I can open it with Trace Compass.</p>
<pre>
skim@d54030999178:/mnt/ssd/work/skim/traces$ babeltrace2 ./kernel/
</pre>
<pre>
07-22 07:49:20.099 5264 5264 E PLUGIN/CTF/META/IR-VISITOR get_unary_unsigned@visitor-generate-ir.c:800 [auto-disc-source-ctf-fs] At line 40 in metadata stream: Invalid constant unsigned integer.
07-22 07:49:20.099 5264 5264 E PLUGIN/CTF/META/IR-VISITOR visit_clock_decl_entry@visitor-generate-ir.c:4357 [auto-disc-source-ctf-fs] At line 40 in metadata stream: Unexpected unary expression for clock class's `offset` attribute.
07-22 07:49:20.099 5264 5264 E PLUGIN/CTF/META/IR-VISITOR visit_clock_decl@visitor-generate-ir.c:4532 [auto-disc-source-ctf-fs] At line 40 in metadata stream: Cannot visit clock class's entry: ret=-22
07-22 07:49:20.099 5264 5264 E PLUGIN/CTF/META/IR-VISITOR ctf_visitor_generate_ir_visit_node@visitor-generate-ir.c:4775 [auto-disc-source-ctf-fs] At line 41 in metadata stream: Cannot visit clock class: ret=-22
07-22 07:49:20.099 5264 5264 E PLUGIN/CTF/META/DECODER ctf_metadata_decoder_append_content@decoder.c:337 [auto-disc-source-ctf-fs] Failed to visit AST node to create CTF IR objects: mdec-addr=0x22a0d90, ret=-22
07-22 07:49:20.099 5264 5264 E PLUGIN/SRC.CTF.FS/META ctf_fs_metadata_set_trace_class@metadata.c:128 [auto-disc-source-ctf-fs] Cannot update metadata decoder's content.
07-22 07:49:20.122 5264 5264 E PLUGIN/SRC.CTF.FS ctf_fs_component_create_ctf_fs_trace_one_path@fs.c:1206 [auto-disc-source-ctf-fs] Cannot create trace for `/mnt/ssd/work/skim/traces/kernel`.
07-22 07:49:20.123 5264 5264 W LIB/GRAPH add_component_with_init_method_data@graph.c:977 Component initialization method failed: status=ERROR, comp-addr=0x22a68d0, comp-name="auto-disc-source-ctf-fs", comp-log-level=WARNING, comp-class-type=SOURCE, comp-class-name="fs", comp-class-partial-descr="Read CTF traces from the file sy", comp-class-is-frozen=0, comp-class-so-handle-addr=0x22b0de0, comp-class-so-handle-path="/usr/lib/x86_64-linux-gnu/babeltrace2/plugins/babeltrace-plugin-ctf.so", comp-input-port-count=0, comp-output-port-count=0
07-22 07:49:20.123 5264 5264 E CLI cmd_run_ctx_create_components_from_config_components@babeltrace2.c:2301 Cannot create component: plugin-name="ctf", comp-cls-name="fs", comp-cls-type=1, comp-name="auto-disc-source-ctf-fs"
07-22 07:49:20.123 5264 5264 E CLI cmd_run@babeltrace2.c:2480 Cannot create components.
ERROR: [Babeltrace CLI] (babeltrace2.c:2480)
Cannot create components.
CAUSED BY [Babeltrace CLI] (babeltrace2.c:2301)
Cannot create component: plugin-name="ctf", comp-cls-name="fs", comp-cls-type=1, comp-name="auto-disc-source-ctf-fs"
CAUSED BY [libbabeltrace2] (graph.c:977)
Component initialization method failed: status=ERROR, comp-addr=0x22a68d0, comp-name="auto-disc-source-ctf-fs", comp-log-level=WARNING,
comp-class-type=SOURCE, comp-class-name="fs", comp-class-partial-descr="Read CTF traces from the file sy", comp-class-is-frozen=0,
comp-class-so-handle-addr=0x22b0de0, comp-class-so-handle-path="/usr/lib/x86_64-linux-gnu/babeltrace2/plugins/babeltrace-plugin-ctf.so",
comp-input-port-count=0, comp-output-port-count=0 CAUSED BY [auto-disc-source-ctf-fs: 'source.ctf.fs'] (fs.c:1206)
Cannot create trace for `/mnt/ssd/work/skim/traces/kernel`.
</pre>
<p>I'm using below version.</p>
<pre>
skim@d54030999178:~/work/tmp$ babeltrace2 -V
Babeltrace 2.0.4 "Amqui"
Amqui (/_mkwi_/) is a town in eastern Qu_bec, Canada, at the base of the Gasp_ peninsula in Bas-Saint-Laurent. Located at the confluence of the Humqui and Matap_dia Rivers, its proximity to woodlands makes it a great destination for outdoor activities such as camping, hiking, and mountain biking.
</pre>
<p>Here is the test result which Philippe Proulx requested.</p>
<pre>
skim@d54030999178:~/ssd_work/traces$ babeltrace2 -o ctf-metadata ./kernel | grep -A10 '^clock {'
</pre>
<pre>
clock {
name = "monotonic";
uuid = "e00bcef2-1ef1-4f02-a241-8561834511fd";
description = "Monotonic Clock";
freq = 1000000000; /* Frequency, in Hz */
/* clock value offset from Epoch is: offset * (1/freq) */
offset = -48;
};
</pre> Babeltrace - Bug #1225 (On pause): src.ctf.lttng-live: `sessions` query error reportinghttps://bugs.lttng.org/issues/12252020-02-17T21:07:37ZFrancis Deslauriersfrancis.deslauriers@efficios.com
<p>The right format to query the list of sessions is:<br /><pre>
babeltrace2 query -p 'url="net://127.0.0.1"' src.ctf.lttng-live sessions
</pre></p>
<p>When adding a `/` at the end of the URL (by mistake maybe) you get the following error:<br /><pre>
10-23 10:11:25.193 2096 2096 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER parse_url@viewer-connection.c:127 [lttng-live] Invalid LTTng live URL format: Expecting `/host/` after hostname or port
10-23 10:11:25.193 2096 2096 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_connect_viewer@viewer-connection.c:253 [lttng-live] Failed to parse URL
10-23 10:11:25.193 2096 2096 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER live_viewer_connection_create@viewer-connection.c:1629 [lttng-live] Failure to establish connection: url="net://127.0.0.1/"
10-23 10:11:25.193 2096 2096 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_query_list_sessions@lttng-live.c:1589 [lttng-live] Failed to create viewer connection
10-23 10:11:25.193 2096 2096 W LIB/QUERY-EXECUTOR bt_query_executor_query@query-executor.c:246 Component class's "query" method failed: query-exec-addr=0x5625255fd7a0, cc-addr=0x5625255f80a0, cc-type=SOURCE, cc-name="lttng-live", cc-partial-descr="Connect to an LTTng relay daemon", cc-is-frozen=0, cc-so-handle-addr=0x5625255f7270, cc-so-handle-path="/usr/local/lib/babeltrace2/plugins/babeltrace-plugin-ctf.so", object="sessions", params-addr=0x5625255e4540, params-type=MAP, params-element-count=1, log-level=WARNING
10-23 10:11:25.193 2096 2096 E CLI cmd_query@babeltrace2.c:727 Failed to query component class: unknown error: plugin-name="ctf", comp-cls-name="lttng-live", comp-cls-type=1 object="sessions"
ERROR: [Babeltrace CLI] (babeltrace2.c:727)
Failed to query component class: unknown error: plugin-name="ctf", comp-cls-name="lttng-live", comp-cls-type=1
object="sessions"
CAUSED BY [Babeltrace library] (query-executor.c:246)
Component class's "query" method failed: query-exec-addr=0x5625255fd7a0, cc-addr=0x5625255f80a0, cc-type=SOURCE,
cc-name="lttng-live", cc-partial-descr="Connect to an LTTng relay daemon", cc-is-frozen=0,
cc-so-handle-addr=0x5625255f7270, cc-so-handle-path="/usr/local/lib/babeltrace2/plugins/babeltrace-plugin-ctf.so",
object="sessions", params-addr=0x5625255e4540, params-type=MAP, params-element-count=1, log-level=WARNING
CAUSED BY ['source.ctf.lttng-live'] (lttng-live.c:1589)
Failed to create viewer connection
CAUSED BY ['source.ctf.lttng-live'] (viewer-connection.c:1629)
Failure to establish connection: url="net://127.0.0.1/"
CAUSED BY ['source.ctf.lttng-live'] (viewer-connection.c:253)
Failed to parse URL
CAUSED BY ['source.ctf.lttng-live'] (viewer-connection.c:127)
Invalid LTTng live URL format: Expecting `/host/` after hostname or port
</pre><br />Suggesting to add the `/host/` part of the URL as if we were trying to connect to a session.<br />When in fact, since I am in a query, I would want it to be suggesting me to remove the extra `/` if anything.</p> LTTng - Bug #1207 (Confirmed): Tools 2.11 fails on destroy for lttng-modules 2.9https://bugs.lttng.org/issues/12072019-11-05T22:19:08ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>The following test from lttng-ivc is failing: test_modules_base_tracing[lttng-modules-2.9-lttng-tools-2.11]</p>
<p>This can be reproduced with head of lttng-ivc and running:</p>
<pre>
sudo tox -- -k test_modules_base_tracing[lttng-modules-2.9-lttng-tools-2.11
</pre>
<p>Relevant part so far from lttng-sessiond verbose mode:<br /><pre>
DEBUG1 - 16:20:40.713886603 [10006/10153]: Begin destroy session trace (id 0) (in cmd_destroy_session() at cmd.c:3176)
DEBUG1 - 16:20:40.713892241 [10006/10153]: Rotate kernel session trace started (session 0) (in kernel_rotate_session() at kernel.c:1445)
DEBUG1 - 16:20:40.713896285 [10006/10153]: Rotate kernel channel 1, session trace (in kernel_rotate_session() at kernel.c:1460)
DEBUG1 - 16:20:40.713899898 [10006/10153]: Consumer rotate channel key 1 (in consumer_rotate_channel() at consumer.c:1694)
DEBUG1 - 16:20:40.713913220 [10164/10173]: Incoming command on sock (in consumer_thread_sessiond_poll() at consumer.c:3434)
DEBUG1 - 16:20:40.713925589 [10164/10173]: Consumer rotate channel 1 (in lttng_kconsumer_recv_cmd() at kernel-consumer.c:1124)
DEBUG1 - 16:20:40.713932035 [10164/10173]: Consumer sample rotate position for channel 1 (in lttng_consumer_rotate_channel() at consumer.c:4064)
Error: Failed to sample snapshot position during channel rotation
Error: Rotate channel failed
DEBUG1 - 16:20:40.713947543 [10164/10173]: Consumer rotate ready streams in channel 1 (in lttng_consumer_rotate_ready_streams() at consumer.c:4400)
DEBUG1 - 16:20:40.713951227 [10006/10153]: Consumer ret code -121 (in consumer_recv_status_reply() at consumer.c:208)
DEBUG1 - 16:20:40.713953876 [10164/10173]: received command on sock (in consumer_thread_sessiond_poll() at consumer.c:3450)
DEBUG1 - 16:20:40.713964554 [10006/10153]: Sending consumer close trace chunk command: relayd_id = -1, session_id = 0, chunk_id = 0, close command = "none" (in consumer_close_trace_chunk() at consumer.c:1970)
DEBUG1 - 16:20:40.713976173 [10164/10173]: Incoming command on sock (in consumer_thread_sessiond_poll() at consumer.c:3434)
DEBUG1 - 16:20:40.713985029 [10164/10173]: Consumer close trace chunk command: relayd_id = (none), session_id = 0, chunk_id = 0, close command = none (in lttng_consumer_close_trace_chunk() at consumer.c:4677)
DEBUG1 - 16:20:40.713999876 [10164/10173]: received command on sock (in consumer_thread_sessiond_poll() at consumer.c:3450)
Error: Failed to perform a quiet rotation as part of the destruction of session "trace": Unknown error code
DEBUG1 - 16:20:40.714017305 [10006/10153]: Tearing down kernel session (in kernel_destroy_session() at kernel.c:1199)
DEBUG1 - 16:20:40.714021825 [10006/10153]: [trace] Closing session fd 66 (in trace_kernel_destroy_session() at trace-kernel.c:689)
DEBUG1 - 16:20:40.714026143 [10006/10153]: [trace] Closing metadata stream fd 82 (in trace_kernel_destroy_session() at trace-kernel.c:699)
DEBUG1 - 16:20:40.714030026 [10006/10153]: [trace] Closing metadata fd 81 (in trace_kernel_destroy_metadata() at trace-kernel.c:662)
DEBUG1 - 16:20:40.714038288 [10006/10153]: [trace] Closing channel fd 73 (in trace_kernel_destroy_channel() at trace-kernel.c:616)
DEBUG1 - 16:20:40.714042757 [10006/10153]: [trace] Closing stream fd 77 (in trace_kernel_destroy_stream() at trace-kernel.c:544)
DEBUG1 - 16:20:40.714046626 [10006/10153]: [trace] Closing stream fd 76 (in trace_kernel_destroy_stream() at trace-kernel.c:544)
DEBUG1 - 16:20:40.714050245 [10006/10153]: [trace] Closing stream fd 75 (in trace_kernel_destroy_stream() at trace-kernel.c:544)
DEBUG1 - 16:20:40.714053833 [10006/10153]: [trace] Closing stream fd 65 (in trace_kernel_destroy_stream() at trace-kernel.c:544)
DEBUG1 - 16:20:40.714057474 [10006/10153]: [trace] Closing event fd 74 (in trace_kernel_destroy_event() at trace-kernel.c:570)
</pre></p>
<p>See attached tar.gz for more context.</p> LTTng - Bug #1177 (Confirmed): Document --pidfile of lttng-sessiond executablehttps://bugs.lttng.org/issues/11772019-03-05T19:00:20ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>All is in title.</p>
<p>For anyone that receive all issue created, we had a db problem on upgrade to redmine 4.0 hence why I created it again.</p> LTTng-tools - Bug #1102 (In Progress): Trigger conditions are not evaluated on subscriptionhttps://bugs.lttng.org/issues/11022017-05-11T22:43:22ZJérémie Galarneaujeremie.galarneau@efficios.com
<p>Trigger conditions are not evaluated on subscription of a client. In the case of buffer usage conditions, this provides no way for clients to know the current state of the buffers.</p> LTTng-tools - Bug #994 (Confirmed): Configured metadata data channel not listed for UST domainhttps://bugs.lttng.org/issues/9942016-01-27T16:17:44ZBernd HufmannBernd.Hufmann@ericsson.com
<p>I noticed this for LTTng 2.7 and 2.6.</p>
<p>After configuring the metadata channel for UST, the metadata channel is not listed when listing the session. By the way, it works for the kernel domain.</p>
<p>To reproduce:</p>
<blockquote>
<p>lttng create my<br />lttng enable-channel metadata -u -s my --subbuf-size 65536<br />lttng list my</p>
</blockquote>
<p>Tracing session my: [inactive]<br /> Trace path: /home/user/lttng-traces/my-20160127-105230</p>
<p>=== Domain: UST global ===</p>
<p>Buffer type: per UID</p> LTTng-tools - Bug #987 (Confirmed): Incomplete string comparison patternhttps://bugs.lttng.org/issues/9872016-01-06T23:23:31ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>In a few instances in lttng-tools, a string comparison pattern involving strncmp and strlen() to limit the length of the comparison is used.</p>
<p>One can quickly find them with:</p>
<p>grep -r -A 3 strncmp . | less -> search for strlen or strnlen</p>
<p>Many of those are OK: they indeed only want to match the beginning of the strings. However, this pattern is incorrectly used in cases where a full match is required, for instance:</p>
<p>ust-registry.c:ht_match_event()<br />./src/bin/lttng/commands/enable_channels.c: if (!strncmp(output_mmap, opt_output, strlen(output_mmap))) {<br />./src/bin/lttng/commands/enable_channels.c: } else if (!strncmp(output_splice, opt_output, strlen(output_splice))) {<br />./src/bin/lttng-sessiond/snapshot.c: if (!strncmp(output->name, name, strlen(name))) {</p> LTTng-modules - Bug #975 (Confirmed): execve compat syscall exit syscall value issuehttps://bugs.lttng.org/issues/9752015-11-08T16:03:18ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>syscall_exit for execve changing the 32/64-bit compat mode for a process has wrong system call number on exit:</p>
<pre>
[19:36:57.188066018] (+0.000000616) sinkpad syscall_entry_execve: { cpu_id = 0 }, { filename = "/usr/bin/burnP6", argv = 0x7FFD275BBF40, envp = 0x7FFD275BBF50 }
[19:36:57.188162705] (+0.000000851) sinkpad module_get: { cpu_id = 0 }, { ip = 18446744071581118857, refcnt = 3, name = "binfmt_misc" }
[19:36:57.188170506] (+0.000000517) sinkpad module_put: { cpu_id = 0 }, { ip = 18446744071581118912, refcnt = 2, name = "binfmt_misc" }
[19:36:57.188630250] (+0.000000911) sinkpad random_get_random_bytes: { cpu_id = 0 }, { nbytes = 16, IP = 18446744071581461559 }
[19:36:57.188630781] (+0.000000531) sinkpad random_extract_entropy: { cpu_id = 0 }, { pool_name = "nonblocking", nbytes = 16, entropy_count = 0, IP = 18446744071583984742 }
[19:36:57.188634640] (+0.000001773) sinkpad sched_waking: { cpu_id = 0 }, { comm = "rngd", tid = 2177, prio = 120, target_cpu = 2 }
[19:36:57.188637855] (+0.000003215) sinkpad sched_stat_sleep: { cpu_id = 0 }, { comm = "rngd", tid = 2177, delay = 2656341 }
[19:36:57.188639681] (+0.000001826) sinkpad sched_wakeup: { cpu_id = 0 }, { comm = "rngd", tid = 2177, prio = 120, target_cpu = 2 }
[19:36:57.188641483] (+0.000000180) sinkpad power_cpu_idle: { cpu_id = 2 }, { state = 4294967295, cpu_id = 2 }
[19:36:57.188646080] (+0.000000644) sinkpad random_mix_pool_bytes_nolock: { cpu_id = 0 }, { pool_name = "nonblocking", bytes = 20, IP = 18446744071583981062 }
[19:36:57.188649456] (+0.000000788) sinkpad random_mix_pool_bytes_nolock: { cpu_id = 0 }, { pool_name = "nonblocking", bytes = 20, IP = 18446744071583981062 }
[19:36:57.188649704] (+0.000000248) sinkpad rcu_utilization: { cpu_id = 2 }, { s = "Start context switch" }
[19:36:57.188650149] (+0.000000182) sinkpad rcu_utilization: { cpu_id = 2 }, { s = "End context switch" }
[19:36:57.188652253] (+0.000001864) sinkpad sched_stat_wait: { cpu_id = 2 }, { comm = "rngd", tid = 2177, delay = 0 }
[19:36:57.188654080] (+0.000001827) sinkpad sched_switch: { cpu_id = 2 }, { prev_comm = "swapper/2", prev_tid = 0, prev_prio = 20, prev_state = 0, next_comm = "rngd", next_tid = 2177, next_prio = 20 }
[19:36:57.188658382] (+0.000000567) sinkpad sched_process_exec: { cpu_id = 0 }, { filename = "/usr/bin/burnP6", tid = 29058, old_tid = 29058 }
[19:36:57.188661040] (+0.000000020) sinkpad rcu_utilization: { cpu_id = 2 }, { s = "Start context switch" }
[19:36:57.188661415] (+0.000000375) sinkpad rcu_utilization: { cpu_id = 2 }, { s = "End context switch" }
[19:36:57.188662327] (+0.000000409) sinkpad sched_stat_runtime: { cpu_id = 2 }, { comm = "rngd", tid = 2177, runtime = 25827, vruntime = 1365908673 }
[19:36:57.188664216] (+0.000001266) sinkpad compat_syscall_exit_olduname: { cpu_id = 0 }, { ret = 0, name = 0 }
</pre> LTTng-tools - Bug #901 (In Progress): Some liblttng-ctl don't return LTTNG_OK on successhttps://bugs.lttng.org/issues/9012015-08-05T18:26:10ZJérémie Galarneaujeremie.galarneau@efficios.com
<p>It appears that some liblttng-ctl functions, such as lttng_create_session_snapshot() have conflicting return code conventions.</p>
<p>In this specific case, the header under lttng/session.h asserts that the function will<br /><pre>
/*
[...]
* Return 0 on success else a negative LTTng error code.
*/
</pre></p>
<p>while the header in lttng-ctl.c affirms that it<br /><pre>
/*
[...]
* Returns LTTNG_OK on success or a negative error code.
*/
</pre></p>
<p>and the function actually returns<br /><pre>
ret = lttng_ctl_ask_sessiond_varlen(&lsm, uris, ...
</pre></p>
<p>which, itself will<br /><pre>
/*
[...]
Return size of data (only payload, not header) or a negative error code.
*/
</pre></p>
<p>This pattern is used in multiple places which breaks code which checks for "ret == LTTNG_OK" instead of "ret < 0".</p> LTTng-tools - Bug #822 (Confirmed): bash-completion sometimes completes too muchhttps://bugs.lttng.org/issues/8222014-07-28T18:04:17ZSimon Marchisimon.marchi@polymtl.ca
<p>I am not sure how to word it, but here is an example. I have a single session, named "auto-20140728-134322".</p>
<p>$ lttng des<tab> => $ lttng destroy</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy <tab> => $ lttng destroy auto-20140728-134322</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy auto-20140728-134322 <tab> => $ lttng destroy auto-20140728-134322 auto-20140728-134322</p>
<p>Oops, we can go like that for a long time. Every <tab> press adds a "auto-20140728-134322". It should detect that we already gave a positional argument. Since the command takes only one positional argument, subsequent <tab> presses should do nothing.</p> LTTng-tools - Bug #752 (In Progress): Output paths need better handling than truncationhttps://bugs.lttng.org/issues/7522014-03-07T21:33:50ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
For instance, in the aftermath of <code>lttng-tools/src/bin/lttng-sessiond/cmd.c:record_ust_snapshot</code>, the <code>msg.u.snapshot_channel.pathname</code> is limited to PATH_MAX (typically 4096) but is built with <code>"%s/%s-%s-%" PRIu64 "%s"</code>, where the successive arguments are (discounting the closing nulls):
<ul>
<li><code>output->consumer->dst.trace_path</code> PATH_MAX - 1 (no trailing /)</li>
<li>/ 1</li>
<li><code>output->name</code> NAME_MAX - 1</li>
<li>- 1</li>
<li><code>output->datetime</code> 15</li>
<li>- 1</li>
<li><code>output->nb_snapshot</code> 20 digits (unsigned 64-bit integer)</li>
<li><code>session_path</code> PATH_MAX - 1 (including leading and trailing /)</li>
</ul>
<p>The worst-case <code>session_path</code> part is <code>'/ust/pid/<proc>-<vpid>-<datetime>/'</code> so it's actually limited to 12+15+5+15 = 47 characters (<code>/proc/PID/status.name</code> is truncated to 15 characters, and VPID is unsigned 16-bit for 5 characters) (closing null excluded). So one solution would be to limit the <code>consumer->dst.trace_path</code> to PATH_MAX - (NAME_MAX - 1 + 15 + 20 + 47 + 3) - 1 (for the null). However, if we want the path+filetitles of the channel files to fit in PATH_MAX, we need to chop another NAME_MAX off (and also limit channel names to NAME_MAX - (1 + 5 + 1 + 10 + 1) [underscore, 16-bit unsigned CPU ID, underscore, 32-bit unsigned chunk number, null] so they fit).</p>
<p>Truncation remains nevertheless possible, and would wreak havoc with the trace output tree. <code>Babeltrace</code> and the user count on proper folder and file tree structure to manage their traces. The code needs to detect instances of truncation and report them as errors.</p>
<p>As an aside, the snapshot output name should be limited to MAX_PATH - (1+10) because it gets suffixed with a hyphen and an unsigned 32-bit integer (the output set sequential ID).</p>