Project

General

Profile

Bug #501

consumerd defuncts - 2.2 rc1

Added by Amer Alhalabi over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Target version:
Start date:
04/10/2013
Due date:
% Done:

0%

Estimated time:

Description

Test Case Procedure:
1. create a Per-UID session with UST domain type and url=/tmp/<some dir>
2. create CTF channel
3. enable events with filter
4. destroy session
5. create another Per-UID session with UST domain type and url= <IP address and port numbers>
6. create CTF channel with switch timer 1 sec
7. create metadata channel with switch timer 1 sec
8. add VPID context
9. enable events with filter
10. start tracing

step 6 fails often because no UST consumer was detected.
I have attached some logs, and below there is more info about the version.

  1. ps -e | grep -i lttng
    21490 pts/0 00:00:00 lttng-sessiond
    21971 pts/0 00:00:01 lttng-consumerd <defunct>
    28781 ? 00:00:00 lttng-relayd
Lttng tools:
  • 500c239 (HEAD, tag: v2.2.0-rc1, origin/master, origin/HEAD) Update version to v2.2.0-rc1
  • 61411ca Fix: Add missing header for make dist
  • ee69440 Fix: add missing declaration for disable-lttng-ust
  • af6142c Fix: don't keep vpid, procname, nor patchlevel environment for per-uid buffers
  • 0f907de Relayd add_stream command handle tracefile rotation
  • fe4477e Move stream file rotation functions to utils
Lttng ust:
  • bdd8ca8 (HEAD, origin/master, origin/HEAD) connect: don't report EACCES
  • 256e8c9 connect: don't print error on EPERM
  • 76bacff Bump liblttng-ust-ctl lib version major number
  • 3469bbb Optimisation: implement callsite hash table in tracepoint.c
  • 5da1090 Optimisation: only update added library in tracepoint.c
  • 5f73392 Optimisation: only fix pending events once per lazy update
  • 9c915ee (tag: v2.2.0-rc1) Version 2.2.0-rc1
  • 46c881a Fix: tracepoint instrumentation constructor order issue
  • 77ca146 Documentation: update 2.0 to 2.x
  • 60302ad Add demo test back as an example
userspace-rcu:
  • d107390 (HEAD, origin/master, origin/HEAD) Add tab to output in order to allow easy nesting of tables.
  • f1e42e5 Remove urcu-api-list.sh from dist tarball
  • 10f55ae Add urcu-api-list.sh script
  • 7e5b9a4 list: implement cds_list_for_each_safe()
  • 108a92e Fix: tests/api.h use cpuset.h
  • b3231c1 Fix hurd-i386: move cpuset tests outside of sched_setaffinity conditional
  • 2953b50 Fix tests: finer-grained use of CPU_SET, CPU_ZERO and cpu_set_t
  • fe1f986 Test for CPU_SET
  • 63b495d Fix build on architectures with HAVE_SCHED_GETCPU but without HAVE_SYSCONF
  • 957be43 README: document that Clang 3.0 (based on LLVM 3.0) is supported
  • 79a8e6c clang: silence "unused expression result" warning

Files

sessiond2_1.log (852 KB) sessiond2_1.log Amer Alhalabi, 04/10/2013 06:04 PM
sessiond2_2.log (86.7 KB) sessiond2_2.log file containing the error Amer Alhalabi, 04/10/2013 06:04 PM
sessiond.log (756 KB) sessiond.log Amer Alhalabi, 04/26/2013 04:09 PM
sessiond3.log (128 KB) sessiond3.log Amer Alhalabi, 05/03/2013 06:13 PM
#1

Updated by Mathieu Desnoyers over 7 years ago

  • Project changed from LTTng to LTTng-tools
#2

Updated by Mathieu Desnoyers over 7 years ago

  • Description updated (diff)
  • Status changed from New to Feedback
  • Assignee set to Mathieu Desnoyers
  • Target version set to 2.2

Please check if the following commits fix your 2 issues (or any of the 2):

lttng-tools:
2b8f875 Cleanup: Use own mutex within timer setup/teardown
f2ad556 Fix: channel management thread should hold a refcount
fc64324 Fix: move metadata cache setup before ownership passing
4419b4f Fix: consumer metadata switch timer error handling

lttng-ust:
64bf51a Fix: add internal mutex for timer

Thanks,

Mathieu

#3

Updated by Amer Alhalabi over 7 years ago

i Mathieu,

I tried the new commits, but still having the same problem.

I have collected some logs for you (attached).

And for some reason now I see 3 relayds.

SC-1:~/temp # lttng-sessiond -vvv --verbose-consumer &> /tmp/sessiond.log &
[1] 23690

SC-1:~/temp # ps -e | grep -i lttng
23690 pts/0 00:00:00 lttng-sessiond

SC-1:~/temp # ps -e | grep -i lttng
23690 pts/0 00:00:00 lttng-sessiond

SC-1:~/temp # ps -e | grep -i lttng
23690 pts/0 00:00:00 lttng-sessiond
24167 pts/0 00:00:00 lttng-consumerd
24329 ? 00:00:00 lttng-relayd
25197 ? 00:00:00 lttng-relayd
25916 ? 00:00:00 lttng-relayd

SC-1:~/temp # ps -e | grep -i lttng
23690 pts/0 00:00:00 lttng-sessiond
24167 pts/0 00:00:00 lttng-consumerd <defunct>

/Amer

#4

Updated by Mathieu Desnoyers over 7 years ago

Can you try with the following commits ?

ea88ca2 Fix: consumerd metadata channel/cache/timer races
a0cbdd2 Fix: consumerd channel destroy race
a500c25 Cleanup: document RCU read-side lock better
f96e454 consumer relayd interaction: fix segfaults

If it does not fix your issue, I will need more info:

- detailed script triggering your issue,
- if possible, a backtrace of all threads within the defunct consumerd.

Thanks,

Mathieu

#5

Updated by Amer Alhalabi over 7 years ago

Hi Mathieu,

I confirm that those commits fix the consumerd defuncts issue. Thanks!

one minor tiny issue: at session destroy, I am getting the following error:
"Error: UST consumer close metadata 9 not found"
I have attached the log.

Thanks,
Amer

#6

Updated by Mathieu Desnoyers over 7 years ago

Please provide the script that triggers this issue.

I'm able to generate those errors if I kill the lttng-relayd by hand before issuing the destroy command, but in that case the error is expected: the remote end closed the connection.

However, if you can trigger this error in "normal" non-faulty scenario, I want to know how, because it may hide something important.

Thanks,

Mathieu

#7

Updated by David Goulet over 7 years ago

Looking at the logs, the relayd is closed before the destroy command.

DEBUG1 [31281/31285]: Consumer set delete flag on stream by idx 2 (in update_endpoint_status_by_netidx() at consumer.c:351)
[...]
DEBUG3 [31281/31294]: Relayd closing socket 20 (in relayd_close() at relayd.c:386)
DEBUG3 [31281/31294]: Relayd closing socket 23 (in relayd_close() at relayd.c:386)
[...]
DEBUG1 [30806/30810]: Processing client command 9 (in process_client_msg() at main.c:2358) <-- DESTROY command
[...]
Error: UST consumer close metadata 9 not found

So this is expected behavior. Maybe it should be a DBG instead of an ERR since this is expected in at least one use case?

#8

Updated by Mathieu Desnoyers over 7 years ago

  • Status changed from Feedback to Resolved

10:43 < Compudj> since from the POV of the sessiond, it's really an error, no ?
10:43 < Compudj> the relayd went out improperly
10:44 < dgoulet> well not if the relayd dies
10:44 < dgoulet> ah right
10:44 < dgoulet> improper shutdown, I agree, that should indicate an error

#9

Updated by Amer Alhalabi over 7 years ago

Hi guys!

I agree with both of you. It should be an error.
The TC kills relayd before destroy, so that error is expected.

We can be put this issue to bed.

Thank you guys for your support.

Amer

Also available in: Atom PDF