Project

General

Profile

Actions

Bug #1335

closed

lttng-relayd: `!vsession->current_trace_chunk` assertion failed

Added by Simon Marchi about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
11/29/2021
Due date:
% Done:

100%

Estimated time:

Description

I am trying to reproduce the scenario 2 of https://github.com/jgalar/PyConCanada2017/

The commands used to start the session are:

# Create a live session
lttng create py_syscalls --live

# Enable all Python agent (logger) events
lttng track --userspace --pid $1
lttng enable-event --python --all

# Enable all syscalls invoked by python
lttng track --kernel --pid $1
lttng enable-event --kernel --syscall --all

lttng start

where $1 is normally the pid of a Python http server, but I was able to reproduce using the pid of a dummy /bin/sleep process.

After running the commands above, I connect with babeltrace or babeltrace2:

[root@simark PyConCanada2017]# babeltrace -i lttng-live net://localhost/host/simark/py_syscalls
[error] Remote side has closed connection
[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/simark/py_syscalls.
[warning] [Context] cannot open trace "net://localhost/host/simark/py_syscalls" for reading.
[error] opening trace "net://localhost/host/simark/py_syscalls" for reading.

[error] none of the specified trace paths could be opened.
[root@simark PyConCanada2017]# babeltrace2 net://localhost/host/simark/py_syscalls                                                                           
11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_recv@viewer-connection.c:197 [auto-disc-source-ctf-lttng-live] Remote side has closed connection
11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE/VIEWER lttng_live_session_attach@viewer-connection.c:1106 [auto-disc-source-ctf-lttng-live] Error receiving attach session reply
11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_get_session@lttng-live.c:477 [auto-disc-source-ctf-lttng-live] Error attaching to LTTng live session
11-29 22:56:32.275 719277 719277 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1662 [auto-disc-source-ctf-lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR11-29 22:56:32.275 719277 719277 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x56536acf7250, iter-upstream-comp-name="auto-disc-source-ctf-lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER muxer_upstream_msg_iter_next@muxer.c:452 [muxer] Upstream iterator's next method returned an error: status=ERROR
11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER validate_muxer_upstream_msg_iters@muxer.c:986 [muxer] Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x56536acf6da0, muxer-upstream-msg-iter-wrap-addr=0x56536acf9f10
11-29 22:56:32.275 719277 719277 E PLUGIN/FLT.UTILS.MUXER muxer_msg_iter_next@muxer.c:1411 [muxer] Cannot get next message: comp-addr=0x56536acfa680, muxer-comp-addr=0x56536acfc260, muxer-msg-iter-addr=0x56536acf6da0, msg-iter-addr=0x56536acf7510, status=ERROR
11-29 22:56:32.275 719277 719277 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:861 Component input port message iterator's "next" method failed: iter-addr=0x56536acf7510, iter-upstream-comp-name="muxer", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer", iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
11-29 22:56:32.275 719277 719277 W LIB/GRAPH consume_graph_sink@graph.c:469 Component's "consume" method failed: status=ERROR, comp-addr=0x56536ad2a6f0, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=0, comp-class-so-handle-addr=0x56536ad2c4d0, comp-class-so-handle-path="/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
11-29 22:56:32.275 719277 719277 E CLI cmd_run@babeltrace2.c:2547 Graph failed to complete successfully

ERROR:    [Babeltrace CLI] (babeltrace2.c:2547)
  Graph failed to complete successfully
CAUSED BY [libbabeltrace2] (graph.c:469)
  Component's "consume" method failed: status=ERROR, comp-addr=0x56536ad2a6f0, comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK,
  comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages (`text` fo", comp-class-is-frozen=0, comp-class-so-handle-addr=0x56536ad2c4d0,
  comp-class-so-handle-path="/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
CAUSED BY [libbabeltrace2] (iterator.c:861)
  Component input port message iterator's "next" method failed: iter-addr=0x56536acf7510, iter-upstream-comp-name="muxer",
  iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer",
  iter-upstream-comp-class-partial-descr="Sort messages from multiple inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:986)
  Cannot validate muxer's upstream message iterator wrapper: muxer-msg-iter-addr=0x56536acf6da0, muxer-upstream-msg-iter-wrap-addr=0x56536acf9f10
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:452)
  Upstream iterator's next method returned an error: status=ERROR
CAUSED BY [libbabeltrace2] (iterator.c:861)
  Component input port message iterator's "next" method failed: iter-addr=0x56536acf7250, iter-upstream-comp-name="auto-disc-source-ctf-lttng-live",
  iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live",
  iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1662)
  Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:477)
  Error attaching to LTTng live session
CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:1106)
  Error receiving attach session reply
CAUSED BY [auto-disc-source-ctf-lttng-live: 'source.ctf.lttng-live'] (viewer-connection.c:197)
  Remote side has closed connection

In both cases, lttng-relay shows:

[root@simark ~]# lttng-relayd 
Warning: No tracing group detected
lttng-relayd: /home/simark/src/lttng-tools/src/bin/lttng-relayd/viewer-session.c:80: viewer_session_attach: Assertion `!vsession->current_trace_chunk' failed.
Aborted (core dumped)

I don't know much about lttng-relayd, so I wasn't able to debug much, but I saw that the relayd received the command LTTNG_VIEWER_ATTACH_SESSION twice, and the second time around, current_trace_chunk was not NULL (it was set the first time).


Files

relayd.log (149 KB) relayd.log Francis Deslauriers, 11/30/2021 09:05 AM
bt2.log (1.11 MB) bt2.log Francis Deslauriers, 11/30/2021 09:05 AM
Actions #1

Updated by Francis Deslauriers about 2 months ago

Can you reproduce this easily?

If so, can you try with this chain of commits : https://review.lttng.org/c/lttng-tools/+/6821/5

Actions #2

Updated by Simon Marchi about 2 months ago

Francis Deslauriers wrote in #note-1:

Can you reproduce this easily?

If so, can you try with this chain of commits : https://review.lttng.org/c/lttng-tools/+/6821/5

It still reproduces with this branch.

Actions #3

Updated by Francis Deslauriers about 2 months ago

I was able to reproduce this issue trivially.
Here is the backtrace of the assert:

Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"):
#0  0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80
#5  0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275
#6  process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341
#7  thread_worker (data=<optimized out>) at live.c:2515
#8  0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Actions #4

Updated by Francis Deslauriers about 2 months ago

Here are the logs of the Relayd and Babeltrace2 recorded while reproducing this issue.

Actions #5

Updated by Francis Deslauriers about 2 months ago

The issue can be reproduce without python.
It seems that using both kernel and ust tracing is triggering the issue.

#!/bin/bash

lttng create py_syscalls --live

lttng enable-event -u -a
lttng enable-event -k -a

lttng start

babeltrace2 net://localhost/host/raton/py_syscalls                                                                           

Actions #6

Updated by Jonathan Rajotte Julien about 2 months ago

Seems like we just hit a major testing gap here.

Actions #7

Updated by Mathieu Desnoyers about 1 month ago

  • Status changed from New to Feedback

This assert appears to be entirely wrong.

It checks that the "viewer session" has a NULL current trace chunk when attaching a session to a viewer session, but in the case where a viewer session has multiple sessions (e.g. with kernel and ust tracing combined), we are attaching each session individually to the viewer session, and we set the current trace chunk of the viewer session when we attach the first session to it.

So it is expected to be non-NULL when attaching the second session.

I suspect we just want to remove this assert:

--- a/src/bin/lttng-relayd/viewer-session.cpp
+++ b/src/bin/lttng-relayd/viewer-session.cpp
@@ -77,7 +77,6 @@ enum lttng_viewer_attach_return_code viewer_session_attach(
        } else {
                int ret;

-               LTTNG_ASSERT(!vsession->current_trace_chunk);
                session->viewer_attached = true;

                ret = viewer_session_set_trace_chunk_copy(vsession,
Actions #8

Updated by Mathieu Desnoyers about 1 month ago

Please review this fix on gerrit:

https://review.lttng.org/c/lttng-tools/+/6859 Fix: relayd: `!vsession->current_trace_chunk` assertion failed

Actions #9

Updated by Mathieu Desnoyers about 1 month ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF