Bug #1335
closedlttng-relayd: `!vsession->current_trace_chunk` assertion failed
100%
Description
I am trying to reproduce the scenario 2 of https://github.com/jgalar/PyConCanada2017/
The commands used to start the session are:
# Create a live session lttng create py_syscalls --live # Enable all Python agent (logger) events lttng track --userspace --pid $1 lttng enable-event --python --all # Enable all syscalls invoked by python lttng track --kernel --pid $1 lttng enable-event --kernel --syscall --all lttng start
where $1 is normally the pid of a Python http server, but I was able to reproduce using the pid of a dummy /bin/sleep process.
After running the commands above, I connect with babeltrace or babeltrace2:
[root@simark PyConCanada2017]# babeltrace -i lttng-live net://localhost/host/simark/py_syscalls [error] Remote side has closed connection [warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/simark/py_syscalls. [warning] [Context] cannot open trace "net://localhost/host/simark/py_syscalls" for reading. [error] opening trace "net://localhost/host/simark/py_syscalls" for reading. [error] none of the specified trace paths could be opened.
[root@simark PyConCanada2017]# babeltrace2 net://localhost/host/simark/py_syscalls 11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_recv@viewer-connection.c:197 [auto-disc-source-ctf-lttng-live] Remote side has closed connection 11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_session_attach@viewer-connection.c:1106 [auto-disc-source-ctf-lttng-live] Error receiving attach session reply 11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_get_session@lttng-live.c:477 [auto-disc-source-ctf-lttng-live] Error attaching to LTTng live session 11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1662 [auto-disc-source-ctf-lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR11-29 22:56:32.275 719277 719277 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x56536acf7250, iter-upstream-comp-name="auto-disc-source-ctf-lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR 11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:452 [muxer] Upstream iterator's next method returned an error: status=ERROR 11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:986 [muxer] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x56536acf6da0, muxer-upstream-msg-iter-wrap-addr=0x56536acf9f10 11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1411 [muxer] Cannot get next message: comp-addr=0x56536acfa680, muxer-comp-addr=0x56536acfc260, muxer-msg-iter-addr=0x56536acf6da0, msg-iter-addr=0x56536acf7510, status=ERROR 11-29 22:56:32.275 719277 719277 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x56536acf7510, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR 11-29 22:56:32.275 719277 719277 W LIB/GRAPH consume_graph_sink@graph.c:469 Component's "consume" method failed: status=ERROR, comp-addr=0x56536ad2a6f0, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=0, comp-class-so-handle-addr=0x56536ad2c4d0, comp-class-so-handle-path="/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0 11-29 22:56:32.275 719277 719277 E CLI cmd_run@babeltrace2.c:2547 Graph failed to complete successfully ERROR: [Babeltrace CLI] (babeltrace2.c:2547) Graph failed to complete successfully CAUSED BY [libbabeltrace2] (graph.c:469) Component's "consume" method failed: status=ERROR, comp-addr=0x56536ad2a6f0, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=0, comp-class-so-handle-addr=0x56536ad2c4d0, comp-class-so-handle-path="/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0 CAUSED BY [libbabeltrace2] (iterator.c:861) Component input port message iterator's "next" method failed: iter-addr=0x56536acf7510, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:986) Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x56536acf6da0, muxer-upstream-msg-iter-wrap-addr=0x56536acf9f10 CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:452) Upstream iterator's next method returned an error: status=ERROR CAUSED BY [libbabeltrace2] (iterator.c:861) Component input port message iterator's "next" method failed: iter-addr=0x56536acf7250, iter-upstream-comp-name="auto-disc-source-ctf-lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1662) Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:477) Error attaching to LTTng live session CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:1106) Error receiving attach session reply CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:197) Remote side has closed connection
In both cases, lttng-relay shows:
[root@simark ~]# lttng-relayd Warning: No tracing group detected lttng-relayd: /home/simark/src/lttng-tools/src/bin/lttng-relayd/viewer-session.c:80: viewer_session_attach: Assertion `!vsession->current_trace_chunk' failed. Aborted (core dumped)
I don't know much about lttng-relayd, so I wasn't able to debug much, but I saw that the relayd received the command LTTNG_VIEWER_ATTACH_SESSION twice, and the second time around, current_trace_chunk was not NULL (it was set the first time).
Files
Updated by Francis Deslauriers over 3 years ago
Can you reproduce this easily?
If so, can you try with this chain of commits : https://review.lttng.org/c/lttng-tools/+/6821/5
Updated by Simon Marchi over 3 years ago
Francis Deslauriers wrote in #note-1:
Can you reproduce this easily?
If so, can you try with this chain of commits : https://review.lttng.org/c/lttng-tools/+/6821/5
It still reproduces with this branch.
Updated by Francis Deslauriers over 3 years ago
I was able to reproduce this issue trivially.
Here is the backtrace of the assert:
Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"): #0 0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80 #5 0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275 #6 process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341 #7 thread_worker (data=<optimized out>) at live.c:2515 #8 0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #9 0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
Updated by Francis Deslauriers over 3 years ago
- File relayd.log relayd.log added
- File bt2.log bt2.log added
Here are the logs of the Relayd and Babeltrace2 recorded while reproducing this issue.
Updated by Francis Deslauriers over 3 years ago
The issue can be reproduce without python.
It seems that using both kernel and ust tracing is triggering the issue.
#!/bin/bash lttng create py_syscalls --live lttng enable-event -u -a lttng enable-event -k -a lttng start babeltrace2 net://localhost/host/raton/py_syscalls
Updated by Jonathan Rajotte Julien over 3 years ago
Seems like we just hit a major testing gap here.
Updated by Mathieu Desnoyers over 3 years ago
- Status changed from New to Feedback
This assert appears to be entirely wrong.
It checks that the "viewer session" has a NULL current trace chunk when attaching a session to a viewer session, but in the case where a viewer session has multiple sessions (e.g. with kernel and ust tracing combined), we are attaching each session individually to the viewer session, and we set the current trace chunk of the viewer session when we attach the first session to it.
So it is expected to be non-NULL when attaching the second session.
I suspect we just want to remove this assert:
--- a/src/bin/lttng-relayd/viewer-session.cpp +++ b/src/bin/lttng-relayd/viewer-session.cpp @@ -77,7 +77,6 @@ enum lttng_viewer_attach_return_code viewer_session_attach( } else { int ret; - LTTNG_ASSERT(!vsession->current_trace_chunk); session->viewer_attached = true; ret = viewer_session_set_trace_chunk_copy(vsession,
Updated by Mathieu Desnoyers over 3 years ago
Please review this fix on gerrit:
https://review.lttng.org/c/lttng-tools/+/6859 Fix: relayd: `!vsession->current_trace_chunk` assertion failed
Updated by Mathieu Desnoyers over 3 years ago
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
Applied in changeset lttng-tools|4381dda7ec8705f7eacf5c1b34eaea861a72b3e6.