Project

General

Profile

Bug #1078

lttng enable-event throws the error "UST create channel failed"

Added by Sébastien Boisvert over 3 years ago. Updated over 3 years ago.

Status:
Invalid
Priority:
Normal
Target version:
Start date:
11/28/2016
Due date:
12/20/2016
% Done:

0%

Estimated time:
10.00 h

Description

Hi all,

I am tracing a C++ application on a 32-core machine (2 x AMD Opteron 6376 16 cores, 256 GB RAM).

I found a bug in lttng-tools.

After 5 tracing sessions, the session number 6 (the sixth session, starting in a state where no lttng* daemons
are running, always crash (n = 10) crashes at "lttng enable-event" ("UST create channel failed").
Because of this, lttng throws an error and does not seem to do a health check or to be able to recover from this.
I know that the problem is in one of the daemons because if I kill them all the problem goes away.

I searched "UST create channel failed" on https://bugs.lttng.org/, and got 1 search result (no
open bugs):

- https://bugs.lttng.org/issues/339 ("Subbuffer size of < 4096 are accepted but unable to use the channel afterwards")

Here is my bug report:

1/4 - Software component versions
--------------------------------

lttng-tools 2.8.2
lttng-ust 2.8.1
popt 1.10.4
userspace-rcu 0.9.2

babeltrace 1.5.0-rc1
libuuid 1.0.3
libxml2 2.9.4
asciidoc 8.6.9

linux 3.13.0-101-generic
g++ 4.8.4-2ubuntu1~14.04.3
cmake 2.8.12.2
make 3.81
binutils 2.24
coreutils 8.21
Ubuntu 14.04.5

2/4: Steps to reproduce the problem
-----------------------------------

1. Make sure that no lttng daemon processes are running, to have a clean state:

bolt$ killall -s SIGTERM lttng-sessiond lttng-runas lttng-consumerd lttng-runas lttng-relayd &> /dev/null
bolt$ ps aux|grep lttng|grep -v grep|wc -l
0

2. Trace the app 10 times. Each time, it creates a new session, which is kept for later.

Repeat this command 10 times:

bolt$ lttng create alcatraz-trace-`date -Ins`; lttng track -u --pid `pidof alcatraz-backbone-builder`; lttng enable-event -u gydle_om:DeNovo*;lttng start;sleep 1;lttng stop

3/4 - Expected result
---------------------

It is expected that 10 session will be created and stopped, one after the other.

There should not be any error. lttng processes are only to be killed before (at the very beginning of the 10 sessions, not between
sessions.

The log should be similar to this (repeated 10 times, with a different session name):

bolt$ lttng create alcatraz-trace-`date -Ins`; lttng track -u --pid `pidof alcatraz-backbone-builder`; lttng enable-event -u gydle_om:DeNovo*;lttng start;sleep 1;lttng stop
Session alcatraz-trace-2016-11-28T10:12:38,791131449-0500 created.
Traces will be written in /home/sboisvert/lttng-traces/alcatraz-trace-2016-11-28T10:12:38,791131449-0500-20161128-101238
PID 20337 tracked in session alcatraz-trace-2016-11-28T10:12:38,791131449-0500
UST event gydle_om:DeNovo* created in channel channel0
Tracing started for session alcatraz-trace-2016-11-28T10:12:38,791131449-0500
Waiting for data availability.
Tracing stopped for session alcatraz-trace-2016-11-28T10:12:38,791131449-0500

4/4 - Actual result
-------------

The session number 6 crashed (5 sessions did not crash before that). The experiment was ran 5 times, and it is always the session 6 that crashes.

Tracing during 2 seconds should, in theory, generate twice the number of events, so it may impact this bug.
The experiment was ran with a "sleep 2" 5 times too, and it always crashes at the session 6 too, like with "sleep 1".

After 5 successful sessions, the session 6 (the sixth) session crashes:

bolt$ lttng create alcatraz-trace-`date -Ins`; lttng track -u --pid `pidof alcatraz-backbone-builder`; lttng enable-event -u gydle_om:DeNovo*;lttng start;sleep 1;lttng stop
Session alcatraz-trace-2016-11-28T09:49:45,434535118-0500 created.
Traces will be written in /home/sboisvert/lttng-traces/alcatraz-trace-2016-11-28T09:49:45,434535118-0500-20161128-094945
PID 20337 tracked in session alcatraz-trace-2016-11-28T09:49:45,434535118-0500
UST event gydle_om:DeNovo* created in channel channel0
Error: No session daemon is available
Error: No session daemon is available

A backtrace can be generated.

Enable the debug mode:

bolt$ export LTTNG_ABORT_ON_ERROR=1
bolt$ ulimit -c
unlimited
bolt$ rm -f core

Run this 5 times:

bolt$ lttng create alcatraz-trace-`date -Ins`; lttng track -u --pid `pidof alcatraz-backbone-builder`; lttng enable-event -u gydle_om:DeNovo*;lttng start;sleep 1;lttng stop

Then, for the session 6, run the command one by one, until it crashes.

bolt$ lttng create alcatraz-trace-`date -Ins`
Session alcatraz-trace-2016-11-28T10:49:23,480808644-0500 created.
Traces will be written in /home/sboisvert/lttng-traces/alcatraz-trace-2016-11-28T10:49:23,480808644-0500-20161128-104923
bolt$ lttng track -u --pid `pidof alcatraz-backbone-builder`
PID 20337 tracked in session alcatraz-trace-2016-11-28T10:49:23,480808644-0500
bolt$ lttng enable-event -u gydle_om:DeNovo*
Error: Event gydle_om:DeNovo*: UST create channel failed (channel channel0, session alcatraz-trace-2016-11-28T10:49:23,480808644-0500)
Aborted (core dumped)

The backtrace:

bolt$ gdb -batch -ex bt lttng core
[New LWP 28837]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `lttng enable-event -u gydle_om:DeNovo*'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00002b9eac6c6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0 0x00002b9eac6c6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00002b9eac6ca028 in __GI_abort () at abort.c:89
#2 0x000000000041ba2b in lttng_abort_on_error () at ../../../lttng-tools-2.8.2/src/common/error.c:224
#3 0x00000000004127e4 in __lttng_print_check_abort (type=PRINT_ERR) at ../../../../lttng-tools-2.8.2/src/common/error.h:131
#4 enable_events (session_name=session_name@entry=0x1d23440 "alcatraz-trace-2016-11-28T10:49:23,480808644-0500") at ../../../../lttng-tools-2.8.2/src/bin/lttng/commands/enable_events.c:1071
#5 0x0000000000413322 in cmd_enable_events (argc=<optimized out>, argv=<optimized out>) at ../../../../lttng-tools-2.8.2/src/bin/lttng/commands/enable_events.c:1362
#6 0x00000000004096bc in handle_command (argv=0x7ffff39d4df0, argc=3) at ../../../../lttng-tools-2.8.2/src/bin/lttng/lttng.c:231
#7 parse_args (argv=<optimized out>, argc=<optimized out>) at ../../../../lttng-tools-2.8.2/src/bin/lttng/lttng.c:403
#8 main (argc=<optimized out>, argv=<optimized out>) at ../../../../lttng-tools-2.8.2/src/bin/lttng/lttng.c:452

The error LTTNG_ERR_UST_CHAN_FAIL ("UST create channel failed") occurs in two places in

https://github.com/lttng/lttng-tools/blob/v2.8.2/src/bin/lttng-sessiond/channel.c#L393

https://github.com/lttng/lttng-tools/blob/v2.8.2/src/bin/lttng-sessiond/channel.c#L414

Also available in: Atom PDF