Bug #1110
closedFork() test 12 from Linux Test Project fails when traced with userspace events; events are miscounted
0%
Description
While tracing fork12
test from LTP, the test failed unexpectedly.
LTTng-ust master as of d9ab33856fd7bef5b598c9b6a47309b3bf9b06f2.
Steps to reproduce:
git clone https://github.com/linux-test-project/ltp.git cd ltp make autotools ./configure make
Then, when running as root, the test will abort if the problem is triggered:
lttng create lttng enable-channel -u chan_ust lttng enable-channel -k chan_kernel lttng add-context -u -t vpid -t ip -t procname -t vtid -c chan_ust lttng enable-event -u -a -c chan_ust lttng enable-event -k -c chan_kernel --syscall --all lttng start LD_PRELOAD=liblttng-ust-dl.so:liblttng-ust-fork.so:liblttng-ust-fd.so ./fork12
After forcibly stopping the test (^C
), errors are reported and events could be miscounted when issuing lttng stop
:
[warning] 9223372037047268318 events discarded, please refer to the documentation on channel configuration.
When running with blocking, the test still fails but the counting seems possible (events could have been indeed lost during PID starvation, which is what this test does):
ltp/testcases/kernel/syscalls/fork# LTTNG_UST_BLOCKING_RETRY_TIMEOUT=-1 LD_PRELOAD=liblttng-ust-fd.so:liblttng-ust-fork.so:liblttng-ust-dl.so:liblttng-ust-libc-wrapper.so ./fork12 fork12 0 TINFO : Forking as many kids as possible fork12 0 TINFO : Number of processes forked is 10717 fork12 1 TPASS : fork() eventually failed with 11: Resource temporarily unavailable (system pauses for a while after creating forking-havoc; test seems to get stuck, so a ^C is sent) ^Cfork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 2 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 2 TBROK : fork12.c:132: Remaining cases broken fork12 3 TBROK : fork12.c:132: Remaining cases broken fork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 2 TBROK : fork12.c:132: Remaining cases broken fork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 2 TBROK : fork12.c:132: Remaining cases broken fork12 1 TBROK : fork12.c:132: Unexpected signal 2 received. fork12 2 TBROK : fork12.c:132: Remaining cases broken fork12 2 TBROK : fork12.c:132: Remaining cases broken (many more...)
When stopping collection:
[warning] 184368 events discarded, please refer to the documentation on channel configuration.
Updated by Jonathan Rajotte Julien over 7 years ago
- Status changed from New to Feedback
Hi,
The hanging seems to be related to the actual testing code:
/* * Introducing a sleep(3) to make sure all children are * at pause() when SIGQUIT is sent to them */ sleep(3); kill(0, SIGQUIT); while (wait(&waitstatus) > 0) ;
Changing sleep(3) to sleep(10) fixes the hang on my machine. The sleep was introduce to prevent a race condition [1]. Using LTTng incurs overhead (lttng ust thread creation, registering, etc.) hence can easily delay execution of child processes and the parent might send SIGQUIT before certain children get to the pause() statement resulting in the observed hang.
As for the value of the discarded event, I was unable to reproduce. Is lttng-modules in sync with HEAD of master ?
[1] http://marc.info/?l=ltp-list&m=117751559908566&w=2
Cheers
Updated by Ricardo Nabinger Sanchez over 7 years ago
Jonathan Rajotte Julien wrote:
Changing sleep(3) to sleep(10) fixes the hang on my machine. The sleep was introduce to prevent a race condition [1]. Using LTTng incurs overhead (lttng ust thread creation, registering, etc.) hence can easily delay execution of child processes and the parent might send SIGQUIT before certain children get to the pause() statement resulting in the observed hang.
I agree with your assessment. The test creates quite a load on the machine. I wasn't aware of this sleep()
subtlety.
Yes, all synced during the same time,As for the value of the discarded event, I was unable to reproduce. Is lttng-modules in sync with HEAD of master ?
master
branch:
- babeltrace:
v1.2.0-654-g76206d8
- lttng-modules:
v2.10.0-rc1-4-g0147ae6
- lttng-tools:
v2.7.0-rc1-1040-gd9ab3385
(lttng-tools seems to be missing a 2.10 tag) - lttng-ust:
v2.10.0-rc1-6-gdd77bd5b
- userspace-rcu:
v0.9.0-62-g2748bd6
Updated by Jonathan Rajotte Julien over 7 years ago
- Status changed from Feedback to Invalid
I was no able to reproduce the miscounted discarded events. Still this is most probably a valid bug see #1114.
Closing this bug as invalid since the problem (race) is not on lttng side.
Thansk for reporting this.