Bug #405
closed
lttng_stop_tracing() Sporadically Taking Too Much Time (more that 90 secs)
Added by Amer Alhalabi almost 12 years ago.
Updated almost 12 years ago.
Description
The issue happens sporadically.
One run of the test suite passes just fine; In another run, one random test case fails because of the amount of time taken to stop tracing.
Here is what I am doing in every test case:
create session
start tracing
enable channel
enable events
stop tracing // with wait
destroy session
I attached the logs for sessiond and relayd.
Also here is a small view of lttng-tools git tree:
- 37a86c6 (HEAD, origin/master, origin/HEAD) Fix: Add missing fct prototypes when disabling UST
- e3e57ea Fix a typo in lttng-probe-module name
- 3c82c39 Assign values to enum lttcomm_sessiond_command
- 8c861e6 Fix: run health test only if root
- 52df240 Clarify empty string/NULL filter errors
- b6bbed5 Fix: add missing padding for UST filter
- 2ad3a9a (tag: v2.1.0-rc8) Update version to v2.1.0-rc8
- 3822545 Fix: Uninit. variable in lttng view
- 5bcdda4 Add already enabled UST event error code
Files
- Description updated (diff)
- Assignee set to David Goulet
- Target version changed from 2.1 pre to 2.1 stable
I can't reproduce that but the logs confirm that there is an issue.
However I can't clearly figure out if the issue is on the tracer side or in tools.
lttng_ustconsumer_data_pending() (src/common/ust-consumer/ust-consumer.c) seems to return 1 even though everything is stopped and nothing more gets extracted from the buffers from which begins a long polling loop for data availability.
My guess would be that something went wrong on the tracer side telling us that there is still sub buffers to be extracted.
Is it possible something wrong happened with your application for this specific case ? Stopped, killed unexpectedly, UST error print...?
Does the trace is readable?
Thanks!
Hi David!
I am not doing any negative testing; meaning I am not stopping/killing any process.
Also, I did not see any error coming from the application.
Finally, the trace is readable.
Please try this patch that should fix this issue.
Apply it on lttng-tools HEAD.
git am 0001-Fix-locking-order-between-consumer-and-stream.patch
Thanks
Hi!
I tried the patch and received the following errors:
Error: Unknown consumer_data type
lttng-consumerd: consumer.c:2556: lttng_consumer_read_subbuffer: Assertion `0' failed.
Error: consumer err socket second poll error
Error: Health error occurred in thread_manage_consumer
SC-1:/home/test_apps # ps -e | grep -i lttng
15912 pts/1 00:00:00 lttng-sessiond
16289 pts/1 00:00:00 lttng-consumerd <defunct>
16430 ? 00:00:00 lttng-relayd
Lttng-tools git tree:
- 9b9db48 (HEAD, masterB) potential fix for bug 405
- f66c074 (origin/master, origin/HEAD) Fix: audit all close/fclose and check returned code
- 41039c0 Fix: update/clean lttng.h comments
- 9e2aeab Fix: install lttng health check man page
- 72f23df Fix: ship relevant documentations with tarball
- 2bd554c Remove useles AUTHORS and NEWS files
- d086eb7 Fix: update urcu version in README and configure.ac
- 48459ee (tag: v2.1.0-rc9) Update version to v2.1.0-rc9
I'm so sorry Amer, there was critical part missing in the patch, here is the new version. ;)
Again sorry, I guess I'm more tired than I thought. ;)
no worries!
so I took the latest patch. currently. The problem unfortunately have been resolved yet and I see the following errors:
PERROR: recvmsg inet: Connection reset by peer [in lttcomm_recvmsg_inet_sock() at inet.c:229]
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Broken pipe [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Correction: the problem unfortunately has NOT been resolved yet
PERROR: recvmsg inet: Connection reset by peer [in lttcomm_recvmsg_inet_sock() at inet.c:229]
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Broken pipe [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
David, a very small issue with your patch attached: in lttng_consumer_read_subbuffer(), you should define a "ssize_t ret" rather than a "int ret". It should probably not be the culprit of your issues though.
Thanks,
Mathieu
Update:
I ran the test suite I have overnight for a longer period of time. I saw that issue happening only once, which could be for a different reason.
So I confirm that the issue I raised was fixed with the latest patch; except for those error messages I am getting.
Thanks,
Amer
Ok good to know.
We are actively (me and Mathieu) working on the problem you have and we
think we identified it so we are coming up with a fix right now.
Thanks!
- Status changed from Confirmed to Resolved
I just pushed 5 commits that fixes important bugs including the patch from this bug.
You should retest with lttng-tools HEAD and if the above issue is still happening, we'll open a new bug.
Thanks!
Also available in: Atom
PDF