Actions
Bug #545
closedlttng2.2.0rc2: Instrumented apps hung (can not execute their main) when sessionD is not responsive (simulated via kill -STOP)
Status:
Resolved
Priority:
Critical
Assignee:
-
Target version:
-
Start date:
05/27/2013
Due date:
% Done:
100%
Estimated time:
Description
Not quite sure whether this should go under lttng-tools or lttng-ust. If it is registered under the wrong queue, please, forward it to the right one. Commit used: ============ userspace : 56e676d Document: rculfhash destroy and resize side-effect in 0.7 lttng-ust : 996aead Add parameter -f to rm in Makefile clean target lttng-tools : dae1096 Fix: use lttng pipe to send new stream to thread babeltrace : 9eaf254 Version 1.0.3 Problem Description: ==================== * When sessionD is hanging (simulated via a "pkill -STOP sessiond"), launched instrumented applications got stuck and can not execute their main until sessiond is killed . According to our understanding, when an instrumented app is launched, if it fails to register with sessiond, it should be able to execute its main after the LTTNG_UST_REGISTER_TIMEOUT has elapsed. This seems to be valid only if the influx of launched instr app is low. When the influx of launched instr app is high, it seems that only the first 32 instr apps can execute their main, the remaining instr app hung forever until the sessonD process is killed. Is problem reproducible ? ========================= * yes How to reproduce (if reproducible): =================================== 1)_ Kill the current sessionD and relaunch a new one 2)_ ps -e |grep lttng 3)_ pkill -STOP sessiond 4)_ Launch 1 instance of instr app (ex: ./TestApp_100perSecOnly 5 & ) After 3 sec (default value of LTTNG_UST_REGISTER_TIMEOUT), receives: libust[9890/9890]: Error: Timed out waiting for lttng-sessiond (in lttng_ust_init() at lttng-ust-comm.c:1198) Then receives the print out from TestApp_100perSecOnly (at every sec for the next 5 sec). This works as intended. 5)_ Launch about 3000 instances of instr apps in the next few seconds. ex: for a in $(seq 1 30); do (for n in $(seq 1 100); do (/home/test_apps/TestApp_100perSecOnly 3 np > /dev/null &); done; usleep 200000); done & Note: the above command launch the instr app with life span of only 3 sec for each instance. 6)_ Will see plenty of "Error: Timed out waiting for lttng-sessiond" 7)_ After a few seconds, ps -e |grep -c TestApp See a magic number "2969" . Wait for a few sec, repeat the above ps command Still see 2969. It looks like only the first 32 instances of instr app were able to execute despite the hanging of sessiond. The remaining instances of instr apps hang. 8)_ pkill -CONT sessond 9)_ ps -e |grep -c TestApp See "2928" remaining. 10)_ Launch a new instance of instr app: (ex: ./TestApp_100perSecOnly 5 &) get: "Timed out waiting for lttng-sessiond" and no periodic printout from this testapp ...that means it got hang too. 11)_ pkill -9 sessiond 12)_ Start getting printout of the instr app launched at step 10 above. 13)_ ps -e |grep -c TestApp get "0" Any other information: ====================== -
Files
Actions