Project

General

Profile

Bug #1218

Deadlock in lttng-ust upon: Restart trying to connect to sessiond

Added by Florian Walbroel 8 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
01/30/2020
Due date:
% Done:

0%

Estimated time:

Description

Operating system: Linux 4.15.0-74-generic #84-Ubuntu SMP x86_64 GNU/Linux
LTTng version: lttng (LTTng Trace Control) 2.10.2 - KeKriek

The bug can be reproduced given the two following bash scripts. Note that enabling debug output via LTTNG_UST_DEBUG seems to change the timing in such a way that I could not reproduce the bug when enabled.

run.sh:

#!/bin/bash

mkdir -p ./output

lttng create --output="./output"
lttng enable-event --userspace lttng_ust_pthread:*
lttng start

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/liblttng-ust-libc-wrapper.so.0 ./script.sh

lttng stop

script.sh:

#!/bin/bash

for l in $(seq 0 1000); do
/bin/true
done

After some short time htop will show something like this:

bash ./run.sh
bash ./script.sh
bash ./script.sh
bash ./script.sh

Attaching to the deadlocked process (bash ./script.sh) with gdb yields:

(gdb) t 1
[Switching to thread 1 (Thread 0x7f1c99c367c0 (LWP 15451))]
#0 0x00007f1c991276c2 in GI_waitpid (pid=-1, stat_loc=0x7fff42651250,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
30 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) bt
#0 0x00007f1c991276c2 in GI_waitpid (pid=-1, stat_loc=0x7fff42651250,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1 0x000055615db55c19 in ?? ()
#2 0x000055615db5734b in wait_for ()
#3 0x000055615db46b64 in execute_command_internal ()
#4 0x000055615db46bf2 in execute_command ()
#5 0x000055615db45d11 in execute_command_internal ()
#6 0x000055615db46bf2 in execute_command ()
#7 0x000055615db31274 in reader_loop ()
#8 0x000055615db2fc7f in main ()
(gdb) t 2
[Switching to thread 2 (Thread 0x7f1c9836e700 (LWP 15452))]
#0 0x00007f1c99165bd7 in __libc_recvmsg (fd=fd@entry=3,
msg=msg@entry=0x7f1c9836d070, flags=flags@entry=0)
at ../sysdeps/unix/sysv/linux/recvmsg.c:28
28 ../sysdeps/unix/sysv/linux/recvmsg.c: No such file or directory.
(gdb) bt
#0 0x00007f1c99165bd7 in __libc_recvmsg (fd=fd@entry=3,
msg=msg@entry=0x7f1c9836d070, flags=flags@entry=0)
at ../sysdeps/unix/sysv/linux/recvmsg.c:28
#1 0x00007f1c98dda1e3 in ustcomm_recv_unix_sock (sock=sock@entry=3,
buf=buf@entry=0x7f1c9836d590, len=len@entry=612) at lttng-ust-comm.c:307
#2 0x00007f1c98de08e1 in ust_listener_thread (
arg=0x7f1c99036a20 <global_apps>) at lttng-ust-comm.c:1675
#3 0x00007f1c983766db in start_thread (arg=0x7f1c9836e700)
at pthread_create.c:463
#4 0x00007f1c9916488f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) t 3
[Switching to thread 3 (Thread 0x7f1c97b6d700 (LWP 15453))]
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f1c98de0149 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=0, op=0,
uaddr=<optimized out>) at /usr/include/x86_64-linux-gnu/urcu/futex.h:65
#2 futex_async (op=0, val=0, timeout=0x0, uaddr2=0x0, val3=0,
uaddr=<optimized out>) at /usr/include/x86_64-linux-gnu/urcu/futex.h:97
#3 wait_for_sessiond (sock_info=0x7f1c990349e0 <local_apps>,
sock_info=0x7f1c990349e0 <local_apps>) at lttng-ust-comm.c:1394
#4 ust_listener_thread (arg=0x7f1c990349e0 <local_apps>)
at lttng-ust-comm.c:1456
#5 0x00007f1c983766db in start_thread (arg=0x7f1c97b6d700)
at pthread_create.c:463
#6 0x00007f1c9916488f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

#1

Updated by Mathieu Desnoyers 7 months ago

I tried with up-to-date lttng-tools (2.10.10) and lttng-ust (2.10.7) and cannot reproduce the hang.

Can you upgrade and test again ?

Also available in: Atom PDF