Project

General

Profile

Actions

Feature #137

open

No more tracing possible after consumerD dies (via kill command)

Added by Tan le tran about 12 years ago. Updated over 8 years ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Target version:
Start date:
02/29/2012
Due date:
% Done:

0%

Estimated time:

Description

There are a couple of weird behavior after the consumerD got killed via "kill" command.

Commit used when this test got carried out:
userspace-rcu : feb-08 fcf4348721903bed57027c7eb0ea13765895cd37
lttng-ust : feb-23 (20:11) 64493e4fa019e9bdfe0d9c6a4738c9552f250f35
lttng-tools : feb-24 (14:41) 6bf73bf53464ad309fcf7f02a4dc397d280b81f8
babeltrace : feb-23 (13:59) 305c65e5d7156ae7936f07ad93dd45ac318b4ce2

Here is the sceanrio:
1)_ lttng-sessiond -vvv &
2)_ Run the instrumented app
3)_ lttng create feb29_ses1 -o /tmp/tdlt/feb29_ses1
lttng enable-event the_tracepoint_name -u
lttng start
Note: There are a couple of printout, "TEST: lib_ring_buffer_reserve data_size..."
Those printf lines have been added by Mathieu to troubleshoot
Issue-24 yesterday.
4)_ ps -elf |grep ltt
5)_ kill <process ID of consumerD>
6)_ ps -elf |grep ltt
get: [lttng-consumerd] <defunct>
No new consumerD has been spawned
7)_ lttng stop
8)_ lttng start feb29_ses1 (start the session again with the hope that new
consummerD will be spawned)
Note: still no new consummerD

9)_ lttng stop
10)_ ls -lR /tmp/tdlt/feb29_ses1/ust/
get: all files with 0 size

11)_ lttng destroy

12)_ lttng create feb29_ses2 -o /tmp/tdlt/feb29_ses2 (test if new session is effected by the
fact that consumerD dies previously)

13)_ lttng enable-event tracepoint_name -s feb29_ses2 -u

Note: This process hang. Can not get the prompt back from the terminal (from where
this command is entered).
&lt;ctrl&gt;+&lt;z&gt;
bg
ps -elf |grep ltt
get: :
[lttng-consumerd] &lt;defunct&gt;
lttng enable-event .... (enable-event is hanging)
lttng-consumerd --quiet -u .... (a new consumerD is spawned)
:

14)_ lttng start
lttng stop
ls -lR /tmp/tdlt/feb29_ses2/ust/
get: no files being created for this session !

15)_ lttng enable-event a_new_tracepoint -s feb29_2 -u
Note: this one is also hanging !

&lt;ctrl&gt;+&lt;z&gt;
bg
ps -elf |grep ltt
get: :
[lttng-consumerd] &lt;defunct&gt;
lttng enable-event .... (1st enable-event is hanging)
lttng-consumerd --quiet -u .... (previous consumerD is spawned)
lttng enable-event .... (2nd enable-event is hanging)
lttng-consumerd --quiet -u .... (a new consumerD is spawned)
:

16)_ kill <process_id of sessiond>
ps -elf |grep ltt
get: :
[lttng-consumerd] <defunct>
lttng enable-event .... (1st enable-event is hanging)
lttng-consumerd --quiet -u .... (previous consumerD is spawned)
lttng enable-event .... (2nd enable-event is hanging)
lttng-consumerd --quiet -u .... (previous consumerD is spawned)
:

17)_ lttng-sessiond -vvv &

18)_ lttng list -u
Get: no apps has been registered !
That means the application that already ran (before the
consumerD is killed), can no longer see the new sessionD.

Therefore, no more tracing is possible !

Please, find enclosed the logfile for the above steps.


Files

Target_feb29.log (111 KB) Target_feb29.log log file for the steps described in this report Tan le tran, 02/29/2012 10:58 AM
Actions #1

Updated by David Goulet about 12 years ago

  • Status changed from New to In Progress
  • Assignee set to David Goulet
  • Priority changed from Critical to High
  • Target version set to 2.0 stable

Indeed, if the consumer dies, the thread managing it dies as well and it's not restarted on start command.

Moving this bug to High and target to 2.0-stable since it's an important missing feature.

Actions #2

Updated by David Goulet about 12 years ago

  • Status changed from In Progress to Confirmed
  • Priority changed from High to Normal
  • Target version changed from 2.0 stable to 2.1 pre

After some work, this fix will be targeted for 2.1 version.

It requires a bit of code on the session daemon to reattach a new consumer to all tracing session. This is too big for a quick fix for 2.0-stable.

Actions #3

Updated by David Goulet about 12 years ago

I've made a fix for 2.0 stable that will report an error if the consumer for the domain is not available. Tracing still does not work after that but at least the user is informed.

commit 5c827ce0dce140b121032837510f89cb70d1650d
Author: David Goulet <>
Date: Thu Apr 19 10:58:42 2012 -0400

We will provide a better fix for 2.1 that will restart the consumer on failure. So, I'm keeping this issue open.

Actions #4

Updated by David Goulet over 11 years ago

  • Target version deleted (2.1 pre)
Actions #5

Updated by Yannick Brosseau over 11 years ago

Not a real work around, but the health check feature allows at least to detect that the consumer has died.

Actions #6

Updated by David Goulet over 11 years ago

  • Assignee deleted (David Goulet)
  • Target version set to 2.2

UPDATE: Fail on my part. Wrong comment in the wrong ticket. IGNORE this post.

Actions #7

Updated by David Goulet over 11 years ago

Alright so here is the real relevant comment :)

Right now, killing the consumer with a SIGTERM is not possible as long as there is existing working streams. Using a SIGKILL evidently stops the consumerd followed by the manager thread in the session daemon. Once that happens, an health error occurs and tracing is basically unavailable for the domain which is notified to the user on any subsequent command for that specific domain.

So I guess here the correct behavior would be to respawn a consumer but we can't for now because we don't have the possibility to reopen the metadata on the tracer for a running session.

Tan, is this problem still an issue on your side with the latest lttng-tools rc7 ?

For the respawn fix, the target version is now 2.2.

Actions #8

Updated by Tan le tran over 11 years ago

I like the 2nd answer a lot better :-))

I just installed the latest commit from lttng-tools,ust,userspace.
Will try out and see...my guess is that it is kind of taken care of
by the healthcheck implementation from lttng and from our side as well.

I will let you know the outcome.
Regards,
Tan

Actions #9

Updated by Tan le tran over 11 years ago

Ok , I just ran the test with latest commit of lttng-tool (07e716e Enable additional kernel probes) and as expected, it got taken care of by the healthcheck function which causes us to restart sessionD. We agree that this is a limitation for now and we are looking forward for 2.2 :-)
Many Thanks,
Tan

Actions #10

Updated by David Goulet about 11 years ago

  • Tracker changed from Bug to Feature
  • Target version deleted (2.2)

This looks more to me like a important feature we would need at some point. Now, with 2.2, the metadata is generated on the session daemon side so restarting a consumer should be easier but still a non trivial change because the buffer ownership of UST buffers are now in the consumer making things a bit more complicated with a consumerd restart during tracing.

I'm flagging this for "future target version" and a "Feature" because the restart of a consumer was never planned as a functionality of the session daemon so it's not quite a bug per say.

Actions #11

Updated by Jérémie Galarneau over 8 years ago

  • Target version set to Wishlist
Actions

Also available in: Atom PDF