Project

General

Profile

Actions

Bug #1179

closed

Crashes when using lttng-live with a per-pid session in Babeltrace 1.5

Added by Francis Deslauriers about 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
03/21/2019
Due date:
% Done:

0%

Estimated time:

Description

I witness multiple crashes when using the per-pid channel configuration with a live session with Babeltrace 1.5. I witnessed these bugs after running the same UST app either one time or a few times. This bug might be caused by the relayd too.

Using the following session configuration:
lttng-tools on the stable 2.10 branch.

lttng create allo --live
lttng enable-channel --buffers-pid -s allo --tracefile-size=5M --tracefile-count=4 -u chan1
lttng enable-event -u -a -c chan1
lttng start allo

Launching Babeltrace with the following command:

babeltrace -i lttng-live  net://localhost/host/raton/allo

Here is a list of some of the crashes I am witnessing:

[warning] Clock "(null)" offset differs between traces (delta 1552119489646955332 ns). Using average.
[error] get_data_packet: error
[error] get_data_packet failed

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6f3e13f in ctf_live_packet_seek (stream_pos=0x5555557fff40, index=0, whence=0) at lttng-live-comm.c:1264
1264            ret = get_next_index(session->ctx, viewer_stream, cur_index, &stream_id);
(gdb) bt
#0  0x00007ffff6f3e13f in ctf_live_packet_seek (stream_pos=0x5555557fff40, index=0, whence=0) at lttng-live-comm.c:1264
#1  0x00007ffff79c25e3 in babeltrace_filestream_seek (file_stream=0x5555557fee80, begin_pos=0x7fffffffe1c0, stream_id=0) at iterator.c:671
#2  0x00007ffff79c26da in bt_iter_add_trace (iter=0x5555557a2030, td_read=0x555555772d10) at iterator.c:712
#3  0x00007ffff79c2888 in bt_iter_init (iter=0x5555557a2030, ctx=0x5555557675c0, begin_pos=0x7fffffffe2d0, end_pos=0x0) at iterator.c:763
#4  0x00007ffff7766e26 in bt_ctf_iter_create (ctx=0x5555557675c0, begin_pos=0x7fffffffe2d0, end_pos=0x0) at iterator.c:53
#5  0x00007ffff6f3f7f2 in lttng_live_read (ctx=0x555555765580) at lttng-live-comm.c:1903
#6  0x00007ffff6f3aa1d in lttng_live_open_trace_read (path=0x5555557614f0 "net://localhost/host/raton/allo") at lttng-live-plugin.c:287
#7  0x00007ffff6f3ab4c in lttng_live_open_trace (path=0x5555557614f0 "net://localhost/host/raton/allo", flags=0, packet_seek=0x0, metadata_fp=0x0) at lttng-live-plugin.c:331
#8  0x00007ffff79c2d42 in bt_context_add_trace (ctx=0x555555764370, path=0x5555557614f0 "net://localhost/host/raton/allo", format_name=0x555555764310 "lttng-live", packet_seek=0x0, stream_list=0x0, metadata=0x0) at context.c:95
#9  0x0000555555557472 in bt_context_add_traces_recursive (ctx=0x555555764370, path=0x5555557614f0 "net://localhost/host/raton/allo", format_str=0x555555764310 "lttng-live", packet_seek=0x0) at babeltrace.c:550
#10 0x0000555555557d42 in main (argc=4, argv=0x7fffffffe578) at babeltrace.c:807

[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/raton/allo.
[warning] [Context] cannot open trace "net://localhost/host/raton/allo" for reading.
[error] opening trace "net://localhost/host/raton/allo" for reading.

[error] none of the specified trace paths could be opened.

[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed
[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed
[error] get_data_packet: error
[error] get_data_packet failed
[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed


Files

Makefile (348 Bytes) Makefile Francis Deslauriers, 03/21/2019 02:23 PM
main.c (711 Bytes) main.c Francis Deslauriers, 03/21/2019 02:23 PM
tp.h (1011 Bytes) tp.h Francis Deslauriers, 03/21/2019 02:23 PM
tp.c (76 Bytes) tp.c Francis Deslauriers, 03/21/2019 02:23 PM
Actions #1

Updated by Francis Deslauriers about 5 years ago

  • Description updated (diff)

Updated by Francis Deslauriers about 5 years ago

The source code of app used to reproduce this issue.

Actions #3

Updated by Jonathan Rajotte Julien about 5 years ago

  • Status changed from New to In Progress

Further work on this resulted in investigating the relayd.

Since this is in per-pid a lot of ordering issue can happen since the viewer is polling for information and an app can "quit" between 2 poll. An app quitting in per-pid is much more complex than in per-uid since each apps have their own streams/metadata and we close them as necessary. This is in contrast with per-uid where the streams are only closed at the end of the tracing session.

Some solution are tested here: https://github.com/PSRCode/lttng-tools/commits/live-per-pid

This is a work in progress. The current state of live combined with short-lived apps in per-pid mode is disastrous. So far the problem seems to mostly lie on the relayd side of thing. Lifetime of objects and trace data (SEEK_BEGINNING/LATEST) are a big concern.

Actions #4

Updated by Francis Deslauriers about 4 years ago

babeltrace --version
BabelTrace Trace Viewer and Converter 1.5.6

with lttng-tools commit: 938bd42282f0ba31bf22ad23b0bd3e04e848e052

I witnessed what seems like the same issue when running lttng-tools' tests/regression/tools/clear/test_ust test.
More specifically, the follow test case:

# Test ust streaming live clear with viewer
# Parameters: tracing_active=1, clear_twice=1, buffer_type=pid

Program received signal SIGSEGV, Segmentation fault.
0x00007f8a0e2d913f in ctf_live_packet_seek (stream_pos=0x562f353d8f90, index=0, whence=0) at lttng-live-comm.c:1264
warning: Source file is more recent than executable.
1264            ret = get_next_index(session->ctx, viewer_stream, cur_index, &stream_id);
(gdb) bt
#0  0x00007f8a0e2d913f in ctf_live_packet_seek (stream_pos=0x562f353d8f90, index=0, whence=0) at lttng-live-comm.c:1264
#1  0x00007f8a0ed5d633 in babeltrace_filestream_seek (file_stream=0x562f353d7ed0, begin_pos=0x7ffd64b1eb10, stream_id=0) at iterator.c:671
#2  0x00007f8a0ed5d72a in bt_iter_add_trace (iter=0x562f35416190, td_read=0x562f353ce910) at iterator.c:712
#3  0x00007f8a0ed5d8d8 in bt_iter_init (iter=0x562f35416190, ctx=0x562f353c65c0, begin_pos=0x7ffd64b1ec20, end_pos=0x0) at iterator.c:763
#4  0x00007f8a0eb01e26 in bt_ctf_iter_create (ctx=0x562f353c65c0, begin_pos=0x7ffd64b1ec20, end_pos=0x0) at iterator.c:53
#5  0x00007f8a0e2da7f2 in lttng_live_read (ctx=0x562f353c4580) at lttng-live-comm.c:1903
#6  0x00007f8a0e2d5a1d in lttng_live_open_trace_read (path=0x562f353bb7c0 "net://localhost/host/raton/FtTagV6hUfhEt0u7") at lttng-live-plugin.c:287
#7  0x00007f8a0e2d5b4c in lttng_live_open_trace (path=0x562f353bb7c0 "net://localhost/host/raton/FtTagV6hUfhEt0u7", flags=0, packet_seek=0x0, metadata_fp=0x0) at lttng-live-plugin.c:331
#8  0x00007f8a0ed5dd92 in bt_context_add_trace (ctx=0x562f353c3370, path=0x562f353bb7c0 "net://localhost/host/raton/FtTagV6hUfhEt0u7", format_name=0x562f353c3310 "lttng-live", packet_seek=0x0, stream_list=0x0, metadata=0x0) at context.c:95
#9  0x0000562f346c9472 in bt_context_add_traces_recursive (ctx=0x562f353c3370, path=0x562f353bb7c0 "net://localhost/host/raton/FtTagV6hUfhEt0u7", format_str=0x562f353c3310 "lttng-live", packet_seek=0x0) at babeltrace.c:550
#10 0x0000562f346c9d42 in main (argc=4, argv=0x7ffd64b1eec8) at babeltrace.c:807

I was not able to reproduce the issue with babeltrace2.

Actions #5

Updated by Francis Deslauriers about 4 years ago

  • Status changed from In Progress to On pause

Nobody is working on this at the moment.

Actions #6

Updated by Francis Deslauriers about 4 years ago

  • Status changed from On pause to Resolved

I was running an outdated version of Babeltrace. This issue is fixed in bt1.5.8.
It's embarrassing. Let's never talk about this again.

Actions

Also available in: Atom PDF