Bug #403
closedContinuation of Bug-386: after an instr app hang, the very first enable-channel always failed regardless of how long we wait.
100%
Description
Description: ============ Bug-386 has been opened to report that consumerD hang when the instrumented app (that is currently being traced on) hang (simulated via a kill -STOP). The fix has then been implemented. However, now with the fix, after the instrumented application hang (via kill -STOP), the very first "enable-channel" always failed. Subsequent "enable-channel" are successful. Scenario: ========= lttng create s1 sleep 1 lttng enable-event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB -u sleep 1 lttng start sleep 1 pkill -STOP TestApp sleep 60 date; lttng create s2 for a in $(seq 1 2); do (echo " "; echo "lttng enable-channel channel0 -s s2 -u"; \ date +%T.%N; time lttng enable-channel channel0 -s s2 -u; date +%T.%N; sleep 1); done From the above, the very first enable-channel always failed.
Files
Updated by David Goulet about 12 years ago
- Status changed from New to Confirmed
- Assignee set to David Goulet
- Target version set to 2.1 stable
Hi Tan,
Yes... this seems the correct behavior. So what happens here is that the session daemon does not know if an application socket was stopped until it is used to send a command (in your example, the first enable-channel). Once done, it times out and the application is deleted.
I'm not sure how we can fix this here since we need at least an action on the socket to know its state.
Any thoughts?
Thanks!
Updated by David Goulet about 12 years ago
- File fix-bug403.diff fix-bug403.diff added
Hi Tan,
Finally, I have a fix. This was due to an error code flow. The channel should be created even if it failed on the application side.
Can you confirm that this patch fixes your problem? It does on my side with the scenario in this bug.
Once confirmed by you, I'll push this patch upstream asap.
Thanks!
Updated by Tan le tran about 12 years ago
Hi David,
I have tried your patch and it works fine.
I no longer see the issue and our healthcheck no longer complains.
Thanks for the fix,
Regards,
Tan
Updated by David Goulet about 12 years ago
This will be upstream before the stable release. We just have to decide if we ignore any errors from the UST tracer or should we return an error if the error is on the transport layer or session daemon side ? (which will stop enabling the other channels).
Updated by David Goulet about 12 years ago
- Status changed from Confirmed to Resolved
- % Done changed from 0 to 100
Applied in changeset 4d710ac2a9cffbfa9e4ebdba4162b8d6ee9020fc.