Bug #1189
closeddead loop in rculfhash with urcu-sigal in userspace-rcu-0.10.2
0%
Description
->System infomation:
Component: Userspace RCU
Component version: 0.10.2
System information:
x86, 24 CPUS
3.10.0-862.14.1.6_59.x86_64 GNU/Linux
->Step to reproduce issue:
1. install userspace-rcu-0.10.2 package
2. apply attached patch to source code of userspace-rcu-0.10.2
3. cd userspace-rcu-0.10.2/doc/examples/rculfhash/
4. make
5. ./cds_lfht_destroy
>Expected behavior: program of Step5 should end after 200,000 loop>Actual behavior: program trapped into dead loop, which can be observed by gdb
->There is no such issue in userspace-rcu-0.9.3
->Root Cause:
1. userspace-rcu-0.10.2 use a seperate thread(cds_lfht_workqueue) to handle hash table auto resize, and this thread block all signal at initialization.
2. While in urcu-signal flavor, call_rcu_thread will send SIGRCU signal to all the registed thread to make them do cmm_smp_mb
3. Based on above, call_rcu_thread and workqueue thread will trapped into a dead loop: where call_rcu_thread hold the rcu_registry_lock to waiting all the registed thread to do cmm_smp_mb, while workqueue thread nerver do cmm_smp_mb and waiting to get rcu_registry_lock to unregiter from rcu.
Files
Updated by Mathieu Desnoyers over 4 years ago
- Status changed from New to Resolved
Fixed by the following commit in master:
commit 9fd30396a597942084b007f33cc7f2c279f746e9 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Sep 19 10:10:31 2019 -0400 Fix: provide errno as argument to urcu_die() commit 1a990de3add "Fix: rculfhash worker needs to unblock to SIGRCU" provides "ret" (-1) as argument to urcu_die(), but should rather provide errno. Reported by Coverity: ** CID 1405700: Error handling issues (NEGATIVE_RETURNS) /src/rculfhash.c: 2171 in cds_lfht_worker_init() Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> commit 1a990de3addad89fc397f57bb359175d307e6960 Author: hewenliang <hewenliang4@huawei.com> Date: Tue Sep 17 10:59:18 2019 -0400 Fix: rculfhash worker needs to unblock to SIGRCU In urcu-signal flavor, call_rcu_thread calls synchronize_rcu which will send SIGRCU signal to all registed threads, and then loops to wait need_mb to be cleared. However, the registed workqueue_thread does not process the SIGRCU signal, and never clear the need_mb. Based on above, call_rcu_thread and workqueue_thread will wait forever for completion of the grace period: call_rcu_thread which holds the rcu_registry_lock, waits for workqueue_thread to do cmm_smp_mb. While workqueue thread never does cmm_smp_mb because of signal blocking, and it will eventually wait to get rcu_registry_lock in do_resize_cb. The phenomenon is as follows, which is easy to be triggered: (gdb) t 2 [Switching to thread 2 (Thread 0xffff83c3b080 (LWP 27116))] 0 0x0000ffff845296c4 in poll () from /lib64/libc.so.6 (gdb) bt 0 0x0000ffff845296c4 in poll () from /lib64/libc.so.6 1 0x0000ffff8461b93c in force_mb_all_readers () at urcu.c:241 2 0x0000ffff8461c748 in smp_mb_master () at urcu.c:249 3 urcu_signal_synchronize_rcu () at urcu.c:445 4 0x0000ffff8461d004 in call_rcu_thread at urcu-call-rcu-impl.h:364 5 0x0000ffff845eb8bc in start_thread () from /lib64/libpthread.so.0 6 0x0000ffff845335cc in thread_start () from /lib64/libc.so.6 (gdb) t 3 [Switching to thread 3 (Thread 0xffff8443c080 (LWP 27191))] 0 0x0000ffff845f51c4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt 0 0x0000ffff845f51c4 in __lll_lock_wait () from /lib64/libpthread.so.0 1 0x0000ffff845ee048 in pthread_mutex_lock () from /lib64/libpthread.so.0 2 0x0000ffff8461b814 in mutex_lock ( <rcu_registry_lock>) at urcu.c:157 3 0x0000ffff8461b9e4 in urcu_signal_unregister_thread () at urcu.c:564 4 0x0000ffff8463e62c in do_resize_cb (work=0x11e2e790) at rculfhash.c:2042 5 0x0000ffff8463c940 in workqueue_thread (arg=0x11e1d260) at workqueue.c:228 6 0x0000ffff845eb8bc in start_thread () from /lib64/libpthread.so.0 7 0x0000ffff845335cc in thread_start () from /lib64/libc.so.6 So we should not block SIGRCU in workqueue thread to avoid blocking forever in the grace period awaiting on the worker thread when using urcu-signal flavor. Signed-off-by: hewenliang <hewenliang4@huawei.com> Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
backported into stable-0.10 as:
commit ef728ceea316503bdfd75c386512045fc8aa8285 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu Sep 19 10:10:31 2019 -0400 Fix: provide errno as argument to urcu_die() commit 1a990de3add "Fix: rculfhash worker needs to unblock to SIGRCU" provides "ret" (-1) as argument to urcu_die(), but should rather provide errno. Reported by Coverity: ** CID 1405700: Error handling issues (NEGATIVE_RETURNS) /src/rculfhash.c: 2171 in cds_lfht_worker_init() Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> commit 23da24d2d03a22841e7f76bf25a52a803b403362 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Wed Sep 18 11:13:24 2019 -0400 Fix: include urcu-signal-nr.h Erroneous include file name in backport of commit 1a990de3add "Fix: rculfhash worker needs to unblock to SIGRCU". Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> commit 22e3a77f65d53b84dbe653fa6cb83181cafe2482 Author: hewenliang <hewenliang4@huawei.com> Date: Tue Sep 17 10:59:18 2019 -0400 Fix: rculfhash worker needs to unblock to SIGRCU In urcu-signal flavor, call_rcu_thread calls synchronize_rcu which will send SIGRCU signal to all registed threads, and then loops to wait need_mb to be cleared. However, the registed workqueue_thread does not process the SIGRCU signal, and never clear the need_mb. Based on above, call_rcu_thread and workqueue_thread will wait forever for completion of the grace period: call_rcu_thread which holds the rcu_registry_lock, waits for workqueue_thread to do cmm_smp_mb. While workqueue thread never does cmm_smp_mb because of signal blocking, and it will eventually wait to get rcu_registry_lock in do_resize_cb. The phenomenon is as follows, which is easy to be triggered: (gdb) t 2 [Switching to thread 2 (Thread 0xffff83c3b080 (LWP 27116))] 0 0x0000ffff845296c4 in poll () from /lib64/libc.so.6 (gdb) bt 0 0x0000ffff845296c4 in poll () from /lib64/libc.so.6 1 0x0000ffff8461b93c in force_mb_all_readers () at urcu.c:241 2 0x0000ffff8461c748 in smp_mb_master () at urcu.c:249 3 urcu_signal_synchronize_rcu () at urcu.c:445 4 0x0000ffff8461d004 in call_rcu_thread at urcu-call-rcu-impl.h:364 5 0x0000ffff845eb8bc in start_thread () from /lib64/libpthread.so.0 6 0x0000ffff845335cc in thread_start () from /lib64/libc.so.6 (gdb) t 3 [Switching to thread 3 (Thread 0xffff8443c080 (LWP 27191))] 0 0x0000ffff845f51c4 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt 0 0x0000ffff845f51c4 in __lll_lock_wait () from /lib64/libpthread.so.0 1 0x0000ffff845ee048 in pthread_mutex_lock () from /lib64/libpthread.so.0 2 0x0000ffff8461b814 in mutex_lock ( <rcu_registry_lock>) at urcu.c:157 3 0x0000ffff8461b9e4 in urcu_signal_unregister_thread () at urcu.c:564 4 0x0000ffff8463e62c in do_resize_cb (work=0x11e2e790) at rculfhash.c:2042 5 0x0000ffff8463c940 in workqueue_thread (arg=0x11e1d260) at workqueue.c:228 6 0x0000ffff845eb8bc in start_thread () from /lib64/libpthread.so.0 7 0x0000ffff845335cc in thread_start () from /lib64/libc.so.6 So we should not block SIGRCU in workqueue thread to avoid blocking forever in the grace period awaiting on the worker thread when using urcu-signal flavor. Signed-off-by: hewenliang <hewenliang4@huawei.com> Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>