Bug #403
closed
Continuation of Bug-386: after an instr app hang, the very first enable-channel always failed regardless of how long we wait.
Added by Tan le tran almost 12 years ago.
Updated almost 12 years ago.
Description
Description:
============
Bug-386 has been opened to report that consumerD hang when the instrumented app (that
is currently being traced on) hang (simulated via a kill -STOP).
The fix has then been implemented.
However, now with the fix, after the instrumented application hang (via kill -STOP),
the very first "enable-channel" always failed. Subsequent "enable-channel" are
successful.
Scenario:
=========
lttng create s1
sleep 1
lttng enable-event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB -u
sleep 1
lttng start
sleep 1
pkill -STOP TestApp
sleep 60
date; lttng create s2
for a in $(seq 1 2); do (echo " "; echo "lttng enable-channel channel0 -s s2 -u"; \
date +%T.%N; time lttng enable-channel channel0 -s s2 -u; date +%T.%N; sleep 1); done
From the above, the very first enable-channel always failed.
Files
- Status changed from New to Confirmed
- Assignee set to David Goulet
- Target version set to 2.1 stable
Hi Tan,
Yes... this seems the correct behavior. So what happens here is that the session daemon does not know if an application socket was stopped until it is used to send a command (in your example, the first enable-channel). Once done, it times out and the application is deleted.
I'm not sure how we can fix this here since we need at least an action on the socket to know its state.
Any thoughts?
Thanks!
Hi Tan,
Finally, I have a fix. This was due to an error code flow. The channel should be created even if it failed on the application side.
Can you confirm that this patch fixes your problem? It does on my side with the scenario in this bug.
Once confirmed by you, I'll push this patch upstream asap.
Thanks!
Hi David,
I have tried your patch and it works fine.
I no longer see the issue and our healthcheck no longer complains.
Thanks for the fix,
Regards,
Tan
This will be upstream before the stable release. We just have to decide if we ignore any errors from the UST tracer or should we return an error if the error is on the transport layer or session daemon side ? (which will stop enabling the other channels).
- Status changed from Confirmed to Resolved
- % Done changed from 0 to 100
Also available in: Atom
PDF