Project

General

Profile

Actions

Bug #1202

closed

Deadlock in probe record code when perf counters enabled

Added by Anton Smyk over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
10/07/2019
Due date:
% Done:

100%

Estimated time:

Description

We experience deadlock in UST probe recording code when perf counters were enabled.

Managed to reproduce the problem in clean minimal install of Oracle Linux 7.7 with latest vanilla RPMs from Efficios site:

[root@tb97647 ~]# rpm -qa 'userspace-rcu*' 'lttng-*' | sort
lttng-tools-2.10.7-1.el7.x86_64
lttng-tools-debuginfo-2.10.7-1.el7.x86_64
lttng-tools-devel-2.10.7-1.el7.x86_64
lttng-ust-2.10.5-1.el7.x86_64
lttng-ust-debuginfo-2.10.5-1.el7.x86_64
lttng-ust-devel-2.10.5-1.el7.x86_64
lttng-ust-java-2.10.5-1.el7.x86_64
lttng-ust-java-agent-2.10.5-1.el7.x86_64
userspace-rcu-0.10.2-1.el7.x86_64
userspace-rcu-debuginfo-0.10.2-1.el7.x86_64
userspace-rcu-devel-0.10.2-1.el7.x86_64

Here are steps to reproduce:

1. In a terminal session start lttng spam program

[antons@tb97647 spam-lttng-ust]$ export LTTNG_HOME=$HOME/TB-97647/test03/lttng-home
[antons@tb97647 spam-lttng-ust]$ ~/spam-lttng-ust/spam 100 300 4
Started spamming with 100 trace events per second for 300 seconds using 4 concurrent threads...

Apparently, for the deadlock to reproduce, it is crucial to start the spam program BEFORE starting the LTTng session.

2. In another terminal window create, configure and start LTTng session:

[antons@tb97647 test03]$ export LTTNG_HOME=$HOME/TB-97647/test03/lttng-home
[antons@tb97647 test03]$ mkdir -p $LTTNG_HOME
[antons@tb97647 test03]$ lttng-sessiond -vvv --verbose-consumer > lttng-sessiond.log 2>&1 &
[antons@tb97647 test03]$ lttng create testtrace
Session testtrace created.
Traces will be written in /home/antons/TB-97647/test03/lttng-home/lttng-traces/testtrace-20191007-134454
[antons@tb97647 test03]$ lttng enable-event -u 'spam:*'
UST event spam:* created in channel channel0
[antons@tb97647 test03]$ for i in perf:thread:cycles perf:thread:stalled-cycles-frontend perf:thread:instructions perf:thread:cache-references perf:thread:cache-misses perf:thread:bus-cycles; do lttng add-context --userspace --type=$i; done
UST context perf:thread:cycles added to all channels
UST context perf:thread:stalled-cycles-frontend added to all channels
UST context perf:thread:instructions added to all channels
UST context perf:thread:cache-references added to all channels
UST context perf:thread:cache-misses added to all channels
UST context perf:thread:bus-cycles added to all channels
[antons@tb97647 test03]$ lttng start testtrace
Tracing started for session testtrace

3. In no time the spam program exhibits deadlock:

[root@tb97647 ~]# pstack `pgrep spam`
Thread 7 (Thread 0x7f6ac0915700 (LWP 22774)):
#0  0x00007f6ac1449be9 in syscall () from /lib64/libc.so.6
#1  0x00007f6ac1b5b60b in futex (val3=0, uaddr2=0x0, timeout=0x0, val=0, op=0, uaddr=0x7f6ac1fd4000) at /usr/include/urcu/futex.h:65
#2  futex_async (op=0, val=0, timeout=0x0, uaddr2=0x0, val3=0, uaddr=0x7f6ac1fd4000) at /usr/include/urcu/futex.h:97
#3  wait_for_sessiond (sock_info=0x7f6ac1da9a20 <global_apps>, sock_info=0x7f6ac1da9a20 <global_apps>) at lttng-ust-comm.c:1405
#4  ust_listener_thread (arg=0x7f6ac1da9a20 <global_apps>) at lttng-ust-comm.c:1467
#5  0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f6ac0114700 (LWP 22775)):
#0  0x00007f6ac1444c2d in poll () from /lib64/libc.so.6
#1  0x00007f6ac0d27646 in poll (__timeout=10, __nfds=0, __fds=0x0) at /usr/include/bits/poll2.h:46
#2  wait_for_readers (input_readers=input_readers@entry=0x7f6ac0113250, cur_snap_readers=cur_snap_readers@entry=0x0, qsreaders=qsreaders@entry=0x7f6ac0113260) at urcu-bp.c:250
#3  0x00007f6ac0d2772c in synchronize_rcu_bp () at urcu-bp.c:319
#4  0x00007f6ac1b64155 in synchronize_trace () at lttng-events.c:135
#5  0x00007f6ac1b5edb5 in lttng_cmd (objd=<optimized out>, cmd=<optimized out>, arg=<optimized out>, uargs=<optimized out>, owner=<optimized out>) at lttng-ust-abi.c:385
#6  0x00007f6ac1b5b3c2 in handle_message (lum=0x7f6ac01135b0, sock=3, sock_info=0x7f6ac1da79e0 <local_apps>) at lttng-ust-comm.c:1000
#7  ust_listener_thread (arg=0x7f6ac1da79e0 <local_apps>) at lttng-ust-comm.c:1709
#8  0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f6abf913700 (LWP 22776)):
#0  0x00007f6ac172d54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f6ac1728e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f6ac1728d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6ac1b5a4cc in ust_lock_nocheck () at lttng-ust-comm.c:178
#4  0x00007f6ac1b732ef in add_thread_field (perf_thread=0x7f6aac0008c0, perf_field=0x7f6ab8012350) at lttng-context-perf-counters.c:311
#5  get_thread_field (field=0x7f6ab8012350) at lttng-context-perf-counters.c:340
#6  0x00007f6ac1b73478 in wrapper_perf_counter_read (field=<optimized out>) at lttng-context-perf-counters.c:350
#7  perf_counter_record (field=<optimized out>, ctx=0x7f6abf912d30, chan=0x7f6ab8010dc0) at lttng-context-perf-counters.c:361
#8  0x00007f6ac1b4ea40 in ctx_record (bufctx=bufctx@entry=0x7f6abf912d30, chan=chan@entry=0x7f6ab8010dc0, ctx=0x7f6ab80114a0, mode=mode@entry=APP_CTX_ENABLED) at lttng-ring-buffer-client.h:140
#9  0x00007f6ac1b76052 in ctx_record (mode=APP_CTX_ENABLED, ctx=<optimized out>, chan=0x7f6ab8010dc0, bufctx=0x7f6abf912d30) at ../libringbuffer/backend.h:91
#10 lttng_write_event_header_slow (config=<optimized out>, event_id=<optimized out>, ctx=<optimized out>) at lttng-ring-buffer-client.h:361
#11 lttng_write_event_header (config=0x7f6ac1da6640 <client_config>, event_id=<optimized out>, ctx=0x7f6abf912d30) at lttng-ring-buffer-client.h:292
#12 lttng_event_reserve (ctx=0x7f6abf912d30, event_id=<optimized out>) at lttng-ring-buffer-client.h:728
#13 0x00000000004019fc in __event_probe__spam___dummy_event ()
#14 0x0000000000400f39 in spammer_thread ()
#15 0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f6abf112700 (LWP 22777)):
#0  0x00007f6ac172d54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f6ac1728e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f6ac1728d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6ac1b5a4cc in ust_lock_nocheck () at lttng-ust-comm.c:178
#4  0x00007f6ac1b732ef in add_thread_field (perf_thread=0x7f6ab00008c0, perf_field=0x7f6ab8012350) at lttng-context-perf-counters.c:311
#5  get_thread_field (field=0x7f6ab8012350) at lttng-context-perf-counters.c:340
#6  0x00007f6ac1b73478 in wrapper_perf_counter_read (field=<optimized out>) at lttng-context-perf-counters.c:350
#7  perf_counter_record (field=<optimized out>, ctx=0x7f6abf111d30, chan=0x7f6ab8010dc0) at lttng-context-perf-counters.c:361
#8  0x00007f6ac1b4ea40 in ctx_record (bufctx=bufctx@entry=0x7f6abf111d30, chan=chan@entry=0x7f6ab8010dc0, ctx=0x7f6ab80114a0, mode=mode@entry=APP_CTX_ENABLED) at lttng-ring-buffer-client.h:140
#9  0x00007f6ac1b76052 in ctx_record (mode=APP_CTX_ENABLED, ctx=<optimized out>, chan=0x7f6ab8010dc0, bufctx=0x7f6abf111d30) at ../libringbuffer/backend.h:91
#10 lttng_write_event_header_slow (config=<optimized out>, event_id=<optimized out>, ctx=<optimized out>) at lttng-ring-buffer-client.h:361
#11 lttng_write_event_header (config=0x7f6ac1da6640 <client_config>, event_id=<optimized out>, ctx=0x7f6abf111d30) at lttng-ring-buffer-client.h:292
#12 lttng_event_reserve (ctx=0x7f6abf111d30, event_id=<optimized out>) at lttng-ring-buffer-client.h:728
#13 0x00000000004019fc in __event_probe__spam___dummy_event ()
#14 0x0000000000400f39 in spammer_thread ()
#15 0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f6abe911700 (LWP 22778)):
#0  0x00007f6ac172d54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f6ac1728e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f6ac1728d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6ac1b5a4cc in ust_lock_nocheck () at lttng-ust-comm.c:178
#4  0x00007f6ac1b732ef in add_thread_field (perf_thread=0x7f6aa80008c0, perf_field=0x7f6ab8012350) at lttng-context-perf-counters.c:311
#5  get_thread_field (field=0x7f6ab8012350) at lttng-context-perf-counters.c:340
#6  0x00007f6ac1b73478 in wrapper_perf_counter_read (field=<optimized out>) at lttng-context-perf-counters.c:350
#7  perf_counter_record (field=<optimized out>, ctx=0x7f6abe910d30, chan=0x7f6ab8010dc0) at lttng-context-perf-counters.c:361
#8  0x00007f6ac1b4ea40 in ctx_record (bufctx=bufctx@entry=0x7f6abe910d30, chan=chan@entry=0x7f6ab8010dc0, ctx=0x7f6ab80114a0, mode=mode@entry=APP_CTX_ENABLED) at lttng-ring-buffer-client.h:140
#9  0x00007f6ac1b76052 in ctx_record (mode=APP_CTX_ENABLED, ctx=<optimized out>, chan=0x7f6ab8010dc0, bufctx=0x7f6abe910d30) at ../libringbuffer/backend.h:91
#10 lttng_write_event_header_slow (config=<optimized out>, event_id=<optimized out>, ctx=<optimized out>) at lttng-ring-buffer-client.h:361
#11 lttng_write_event_header (config=0x7f6ac1da6640 <client_config>, event_id=<optimized out>, ctx=0x7f6abe910d30) at lttng-ring-buffer-client.h:292
#12 lttng_event_reserve (ctx=0x7f6abe910d30, event_id=<optimized out>) at lttng-ring-buffer-client.h:728
#13 0x00000000004019fc in __event_probe__spam___dummy_event ()
#14 0x0000000000400f39 in spammer_thread ()
#15 0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f6abe110700 (LWP 22779)):
#0  0x00007f6ac172d54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f6ac1728e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f6ac1728d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6ac1b5a4cc in ust_lock_nocheck () at lttng-ust-comm.c:178
#4  0x00007f6ac1b732ef in add_thread_field (perf_thread=0x7f6ab40008c0, perf_field=0x7f6ab8012350) at lttng-context-perf-counters.c:311
#5  get_thread_field (field=0x7f6ab8012350) at lttng-context-perf-counters.c:340
#6  0x00007f6ac1b73478 in wrapper_perf_counter_read (field=<optimized out>) at lttng-context-perf-counters.c:350
#7  perf_counter_record (field=<optimized out>, ctx=0x7f6abe10fd30, chan=0x7f6ab8010dc0) at lttng-context-perf-counters.c:361
#8  0x00007f6ac1b4ea40 in ctx_record (bufctx=bufctx@entry=0x7f6abe10fd30, chan=chan@entry=0x7f6ab8010dc0, ctx=0x7f6ab80114a0, mode=mode@entry=APP_CTX_ENABLED) at lttng-ring-buffer-client.h:140
#9  0x00007f6ac1b76052 in ctx_record (mode=APP_CTX_ENABLED, ctx=<optimized out>, chan=0x7f6ab8010dc0, bufctx=0x7f6abe10fd30) at ../libringbuffer/backend.h:91
#10 lttng_write_event_header_slow (config=<optimized out>, event_id=<optimized out>, ctx=<optimized out>) at lttng-ring-buffer-client.h:361
#11 lttng_write_event_header (config=0x7f6ac1da6640 <client_config>, event_id=<optimized out>, ctx=0x7f6abe10fd30) at lttng-ring-buffer-client.h:292
#12 lttng_event_reserve (ctx=0x7f6abe10fd30, event_id=<optimized out>) at lttng-ring-buffer-client.h:728
#13 0x00000000004019fc in __event_probe__spam___dummy_event ()
#14 0x0000000000400f39 in spammer_thread ()
#15 0x00007f6ac1726ea5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6ac144f8cd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f6ac1fc77c0 (LWP 22773)):
#0  0x00007f6ac1728017 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000401127 in main ()


Files

lttng-sessiond.log.gz (39.5 KB) lttng-sessiond.log.gz Anton Smyk, 10/07/2019 08:16 AM
0001-Fix-lttng-perf-counter-deadlock.patch (7.81 KB) 0001-Fix-lttng-perf-counter-deadlock.patch Mathieu Desnoyers, 10/09/2019 12:44 PM
Actions

Also available in: Atom PDF