Bug #403: Continuation of Bug-386: after an instr app hang, the very first enable-channel always failed regardless of how long we wait. - LTTng-tools - LTTng bugs repository

Actions

Copy link

Bug #403

closed

Continuation of Bug-386: after an instr app hang, the very first enable-channel always failed regardless of how long we wait.

Added by Tan le tran over 12 years ago. Updated over 12 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Goulet

Target version:

LTTng - 2.1 stable

Start date:

11/22/2012

Due date:

% Done:

100%

Estimated time:

Description

Description:
============
  Bug-386 has been opened to report that consumerD hang when the instrumented app (that
  is currently being traced on) hang (simulated via a kill -STOP).
  The fix has then been implemented.

  However, now with the fix, after the instrumented application hang (via kill -STOP),
  the very first "enable-channel" always failed. Subsequent "enable-channel" are 
  successful.

Scenario:
=========
    lttng create s1
    sleep 1
    lttng enable-event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB -u
    sleep 1
    lttng start
    sleep 1
    pkill -STOP TestApp
    sleep 60
    date; lttng create s2
    for a in $(seq 1 2); do (echo " "; echo "lttng enable-channel channel0 -s s2 -u"; \
               date +%T.%N; time lttng enable-channel channel0 -s s2 -u; date +%T.%N; sleep 1); done

From the above, the very first enable-channel always failed.

Files

Download all files

StopTestApp_HaveToEnableChanTwice_noVVV.log (21.6 KB) StopTestApp_HaveToEnableChanTwice_noVVV.log	Terminal log (with out sessiond -vvv)	Tan le tran, 11/22/2012 05:53 PM
StopTestApp_HaveToEnableChanTwice_withVVV.log (74.6 KB) StopTestApp_HaveToEnableChanTwice_withVVV.log	Terminal log (with sessiond -vvv)	Tan le tran, 11/22/2012 05:53 PM
fix-bug403.diff (4.2 KB) fix-bug403.diff		David Goulet, 11/27/2012 12:42 PM

Actions

Copy link

Updated by David Goulet over 12 years ago

Status changed from New to Confirmed
Assignee set to David Goulet
Target version set to 2.1 stable

Hi Tan,

Yes... this seems the correct behavior. So what happens here is that the session daemon does not know if an application socket was stopped until it is used to send a command (in your example, the first enable-channel). Once done, it times out and the application is deleted.

I'm not sure how we can fix this here since we need at least an action on the socket to know its state.

Any thoughts?

Thanks!

Actions

Copy link

Updated by David Goulet over 12 years ago

File fix-bug403.diff fix-bug403.diff added

Hi Tan,

Finally, I have a fix. This was due to an error code flow. The channel should be created even if it failed on the application side.

Can you confirm that this patch fixes your problem? It does on my side with the scenario in this bug.

Once confirmed by you, I'll push this patch upstream asap.

Thanks!

Actions

Copy link

Updated by Tan le tran over 12 years ago

Hi David,

I have tried your patch and it works fine.
I no longer see the issue and our healthcheck no longer complains.

Thanks for the fix,
Regards,
Tan

Actions

Copy link

Updated by David Goulet over 12 years ago

This will be upstream before the stable release. We just have to decide if we ignore any errors from the UST tracer or should we return an error if the error is on the transport layer or session daemon side ? (which will stop enabling the other channels).

Actions

Copy link