Project

General

Profile

Bug #1179

Crashes when using lttng-live with a per-pid session in Babeltrace 1.5

Added by Francis Deslauriers 3 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
03/21/2019
Due date:
% Done:

0%

Estimated time:

Description

I witness multiple crashes when using the per-pid channel configuration with a live session with Babeltrace 1.5. I witnessed these bugs after running the same UST app either one time or a few times. This bug might be caused by the relayd too.

Using the following session configuration:
lttng-tools on the stable 2.10 branch.

lttng create allo --live
lttng enable-channel --buffers-pid -s allo --tracefile-size=5M --tracefile-count=4 -u chan1
lttng enable-event -u -a -c chan1
lttng start allo

Launching Babeltrace with the following command:

babeltrace -i lttng-live  net://localhost/host/raton/allo

Here is a list of some of the crashes I am witnessing:

[warning] Clock "(null)" offset differs between traces (delta 1552119489646955332 ns). Using average.
[error] get_data_packet: error
[error] get_data_packet failed

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6f3e13f in ctf_live_packet_seek (stream_pos=0x5555557fff40, index=0, whence=0) at lttng-live-comm.c:1264
1264            ret = get_next_index(session->ctx, viewer_stream, cur_index, &stream_id);
(gdb) bt
#0  0x00007ffff6f3e13f in ctf_live_packet_seek (stream_pos=0x5555557fff40, index=0, whence=0) at lttng-live-comm.c:1264
#1  0x00007ffff79c25e3 in babeltrace_filestream_seek (file_stream=0x5555557fee80, begin_pos=0x7fffffffe1c0, stream_id=0) at iterator.c:671
#2  0x00007ffff79c26da in bt_iter_add_trace (iter=0x5555557a2030, td_read=0x555555772d10) at iterator.c:712
#3  0x00007ffff79c2888 in bt_iter_init (iter=0x5555557a2030, ctx=0x5555557675c0, begin_pos=0x7fffffffe2d0, end_pos=0x0) at iterator.c:763
#4  0x00007ffff7766e26 in bt_ctf_iter_create (ctx=0x5555557675c0, begin_pos=0x7fffffffe2d0, end_pos=0x0) at iterator.c:53
#5  0x00007ffff6f3f7f2 in lttng_live_read (ctx=0x555555765580) at lttng-live-comm.c:1903
#6  0x00007ffff6f3aa1d in lttng_live_open_trace_read (path=0x5555557614f0 "net://localhost/host/raton/allo") at lttng-live-plugin.c:287
#7  0x00007ffff6f3ab4c in lttng_live_open_trace (path=0x5555557614f0 "net://localhost/host/raton/allo", flags=0, packet_seek=0x0, metadata_fp=0x0) at lttng-live-plugin.c:331
#8  0x00007ffff79c2d42 in bt_context_add_trace (ctx=0x555555764370, path=0x5555557614f0 "net://localhost/host/raton/allo", format_name=0x555555764310 "lttng-live", packet_seek=0x0, stream_list=0x0, metadata=0x0) at context.c:95
#9  0x0000555555557472 in bt_context_add_traces_recursive (ctx=0x555555764370, path=0x5555557614f0 "net://localhost/host/raton/allo", format_str=0x555555764310 "lttng-live", packet_seek=0x0) at babeltrace.c:550
#10 0x0000555555557d42 in main (argc=4, argv=0x7fffffffe578) at babeltrace.c:807

[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/raton/allo.
[warning] [Context] cannot open trace "net://localhost/host/raton/allo" for reading.
[error] opening trace "net://localhost/host/raton/allo" for reading.

[error] none of the specified trace paths could be opened.

[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed
[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed
[error] get_data_packet: error
[error] get_data_packet failed
[warning] Tracer discarded 3891110078048108544 events between [03:18:10.244516310] and [03:12:54.254696918] in trace UUID cebd6ed4d91f4d284aab04e1ba4885, within stream id 0. You should consider recording a new trace with larger buffers or with fewer events enabled.
[error] get_data_packet: error
[error] get_data_packet failed


Files

Makefile (348 Bytes) Makefile Francis Deslauriers, 03/21/2019 02:23 PM
main.c (711 Bytes) main.c Francis Deslauriers, 03/21/2019 02:23 PM
tp.h (1011 Bytes) tp.h Francis Deslauriers, 03/21/2019 02:23 PM
tp.c (76 Bytes) tp.c Francis Deslauriers, 03/21/2019 02:23 PM

History

#1

Updated by Francis Deslauriers 3 months ago

  • Description updated (diff)
#2

Updated by Francis Deslauriers 3 months ago

The source code of app used to reproduce this issue.

#3

Updated by Jonathan Rajotte Julien 3 months ago

  • Status changed from New to In Progress

Further work on this resulted in investigating the relayd.

Since this is in per-pid a lot of ordering issue can happen since the viewer is polling for information and an app can "quit" between 2 poll. An app quitting in per-pid is much more complex than in per-uid since each apps have their own streams/metadata and we close them as necessary. This is in contrast with per-uid where the streams are only closed at the end of the tracing session.

Some solution are tested here: https://github.com/PSRCode/lttng-tools/commits/live-per-pid

This is a work in progress. The current state of live combined with short-lived apps in per-pid mode is disastrous. So far the problem seems to mostly lie on the relayd side of thing. Lifetime of objects and trace data (SEEK_BEGINNING/LATEST) are a big concern.

Also available in: Atom PDF