Project

General

Profile

Actions

Bug #405

closed

lttng_stop_tracing() Sporadically Taking Too Much Time (more that 90 secs)

Added by Amer Alhalabi over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
11/30/2012
Due date:
% Done:

0%

Estimated time:

Description

The issue happens sporadically.
One run of the test suite passes just fine; In another run, one random test case fails because of the amount of time taken to stop tracing.
Here is what I am doing in every test case:

create session
start tracing
enable channel
enable events
stop tracing // with wait
destroy session

I attached the logs for sessiond and relayd.

Also here is a small view of lttng-tools git tree:

  • 37a86c6 (HEAD, origin/master, origin/HEAD) Fix: Add missing fct prototypes when disabling UST
  • e3e57ea Fix a typo in lttng-probe-module name
  • 3c82c39 Assign values to enum lttcomm_sessiond_command
  • 8c861e6 Fix: run health test only if root
  • 52df240 Clarify empty string/NULL filter errors
  • b6bbed5 Fix: add missing padding for UST filter
  • 2ad3a9a (tag: v2.1.0-rc8) Update version to v2.1.0-rc8
  • 3822545 Fix: Uninit. variable in lttng view
  • 5bcdda4 Add already enabled UST event error code

Files

sessiond_consumerd.log (857 KB) sessiond_consumerd.log output of sessiond --vvv --verbose-consumer Amer Alhalabi, 11/30/2012 06:24 PM
relayd.log (216 KB) relayd.log output of realyd Amer Alhalabi, 11/30/2012 06:24 PM
0001-Fix-locking-order-between-consumer-and-stream.patch (5.29 KB) 0001-Fix-locking-order-between-consumer-and-stream.patch David Goulet, 12/04/2012 06:25 PM
0001-Fix-locking-order-between-consumer-and-stream.patch (5.3 KB) 0001-Fix-locking-order-between-consumer-and-stream.patch David Goulet, 12/04/2012 07:52 PM
Actions #1

Updated by David Goulet over 11 years ago

  • Description updated (diff)
  • Assignee set to David Goulet
  • Target version changed from 2.1 pre to 2.1 stable

I can't reproduce that but the logs confirm that there is an issue.

However I can't clearly figure out if the issue is on the tracer side or in tools.

lttng_ustconsumer_data_pending() (src/common/ust-consumer/ust-consumer.c) seems to return 1 even though everything is stopped and nothing more gets extracted from the buffers from which begins a long polling loop for data availability.

My guess would be that something went wrong on the tracer side telling us that there is still sub buffers to be extracted.

Is it possible something wrong happened with your application for this specific case ? Stopped, killed unexpectedly, UST error print...?

Does the trace is readable?

Thanks!

Actions #2

Updated by Amer Alhalabi over 11 years ago

Hi David!

I am not doing any negative testing; meaning I am not stopping/killing any process.
Also, I did not see any error coming from the application.
Finally, the trace is readable.

Actions #3

Updated by David Goulet over 11 years ago

Please try this patch that should fix this issue.

Apply it on lttng-tools HEAD.

git am 0001-Fix-locking-order-between-consumer-and-stream.patch

Thanks

Actions #4

Updated by Amer Alhalabi over 11 years ago

Hi!

I tried the patch and received the following errors:

Error: Unknown consumer_data type
lttng-consumerd: consumer.c:2556: lttng_consumer_read_subbuffer: Assertion `0' failed.
Error: consumer err socket second poll error
Error: Health error occurred in thread_manage_consumer

SC-1:/home/test_apps # ps -e | grep -i lttng
15912 pts/1 00:00:00 lttng-sessiond
16289 pts/1 00:00:00 lttng-consumerd <defunct>
16430 ? 00:00:00 lttng-relayd

Lttng-tools git tree:

  • 9b9db48 (HEAD, masterB) potential fix for bug 405
  • f66c074 (origin/master, origin/HEAD) Fix: audit all close/fclose and check returned code
  • 41039c0 Fix: update/clean lttng.h comments
  • 9e2aeab Fix: install lttng health check man page
  • 72f23df Fix: ship relevant documentations with tarball
  • 2bd554c Remove useles AUTHORS and NEWS files
  • d086eb7 Fix: update urcu version in README and configure.ac
  • 48459ee (tag: v2.1.0-rc9) Update version to v2.1.0-rc9
Actions #5

Updated by David Goulet over 11 years ago

I'm so sorry Amer, there was critical part missing in the patch, here is the new version. ;)

Again sorry, I guess I'm more tired than I thought. ;)

Actions #6

Updated by Amer Alhalabi over 11 years ago

no worries!

so I took the latest patch. currently. The problem unfortunately have been resolved yet and I see the following errors:

PERROR: recvmsg inet: Connection reset by peer [in lttcomm_recvmsg_inet_sock() at inet.c:229]
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Broken pipe [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)

Actions #7

Updated by Amer Alhalabi over 11 years ago

Correction: the problem unfortunately has NOT been resolved yet

PERROR: recvmsg inet: Connection reset by peer [in lttcomm_recvmsg_inet_sock() at inet.c:229]
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37)
PERROR: Error in file write: Broken pipe [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1361]
Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)
Error: Error writing to tracefile (ret: 0 != len: 4096 != subbuf_size: 68)

Actions #8

Updated by Mathieu Desnoyers over 11 years ago

David, a very small issue with your patch attached: in lttng_consumer_read_subbuffer(), you should define a "ssize_t ret" rather than a "int ret". It should probably not be the culprit of your issues though.

Thanks,

Mathieu

Actions #9

Updated by Amer Alhalabi over 11 years ago

Update:
I ran the test suite I have overnight for a longer period of time. I saw that issue happening only once, which could be for a different reason.
So I confirm that the issue I raised was fixed with the latest patch; except for those error messages I am getting.

Thanks,
Amer

Actions #10

Updated by David Goulet over 11 years ago

Ok good to know.

We are actively (me and Mathieu) working on the problem you have and we
think we identified it so we are coming up with a fix right now.

Thanks!

Actions #11

Updated by David Goulet over 11 years ago

  • Status changed from Confirmed to Resolved

I just pushed 5 commits that fixes important bugs including the patch from this bug.

You should retest with lttng-tools HEAD and if the above issue is still happening, we'll open a new bug.

Thanks!

Actions

Also available in: Atom PDF