Project

General

Profile

Actions

Bug #403

closed

Continuation of Bug-386: after an instr app hang, the very first enable-channel always failed regardless of how long we wait.

Added by Tan le tran over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
11/22/2012
Due date:
% Done:

100%

Estimated time:

Description

Description:
============
  Bug-386 has been opened to report that consumerD hang when the instrumented app (that
  is currently being traced on) hang (simulated via a kill -STOP).
  The fix has then been implemented.

  However, now with the fix, after the instrumented application hang (via kill -STOP),
  the very first "enable-channel" always failed. Subsequent "enable-channel" are 
  successful.

Scenario:
=========
    lttng create s1
    sleep 1
    lttng enable-event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB -u
    sleep 1
    lttng start
    sleep 1
    pkill -STOP TestApp
    sleep 60
    date; lttng create s2
    for a in $(seq 1 2); do (echo " "; echo "lttng enable-channel channel0 -s s2 -u"; \
               date +%T.%N; time lttng enable-channel channel0 -s s2 -u; date +%T.%N; sleep 1); done

From the above, the very first enable-channel always failed.


Files

StopTestApp_HaveToEnableChanTwice_noVVV.log (21.6 KB) StopTestApp_HaveToEnableChanTwice_noVVV.log Terminal log (with out sessiond -vvv) Tan le tran, 11/22/2012 05:53 PM
StopTestApp_HaveToEnableChanTwice_withVVV.log (74.6 KB) StopTestApp_HaveToEnableChanTwice_withVVV.log Terminal log (with sessiond -vvv) Tan le tran, 11/22/2012 05:53 PM
fix-bug403.diff (4.2 KB) fix-bug403.diff David Goulet, 11/27/2012 12:42 PM
Actions #1

Updated by David Goulet over 11 years ago

  • Status changed from New to Confirmed
  • Assignee set to David Goulet
  • Target version set to 2.1 stable

Hi Tan,

Yes... this seems the correct behavior. So what happens here is that the session daemon does not know if an application socket was stopped until it is used to send a command (in your example, the first enable-channel). Once done, it times out and the application is deleted.

I'm not sure how we can fix this here since we need at least an action on the socket to know its state.

Any thoughts?

Thanks!

Actions #2

Updated by David Goulet over 11 years ago

Hi Tan,

Finally, I have a fix. This was due to an error code flow. The channel should be created even if it failed on the application side.

Can you confirm that this patch fixes your problem? It does on my side with the scenario in this bug.

Once confirmed by you, I'll push this patch upstream asap.

Thanks!

Actions #3

Updated by Tan le tran over 11 years ago

Hi David,

I have tried your patch and it works fine.
I no longer see the issue and our healthcheck no longer complains.

Thanks for the fix,
Regards,
Tan

Actions #4

Updated by David Goulet over 11 years ago

This will be upstream before the stable release. We just have to decide if we ignore any errors from the UST tracer or should we return an error if the error is on the transport layer or session daemon side ? (which will stop enabling the other channels).

Actions #5

Updated by David Goulet over 11 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF