Actions
Bug #539
closedlttng-tools2.2.0rc2: Possible memory leak from sessionD ?
Start date:
05/22/2013
Due date:
% Done:
0%
Estimated time:
Description
Commit used: ============ userspace-rcu : 56e676d (HEAD, origin/stable-0.7) Document: rculfhash destroy and resize side-effect in 0.7 lttng-ust : 352fce3 (HEAD, origin/master, origin/HEAD) Remove 0.x TODO lttng-tools : b31398b (HEAD, origin/master, origin/HEAD) Fix: increment UST channel refcount at stream creation babeltrace : 9eaf254 (HEAD, tag: v1.0.3, origin/stable-1.0) Version 1.0.3 Problem Description: ==================== * There seems to be memory leak in sessionD when many short live instrumented apps are launched during a period of time. At any time, we have about 700 instances of instrumented app present. Each of those instances has only 3 sec life spann. After about 3 minute run, the value under "%MEM" column increases from 0.0 to 0.4 for sessionD: [SC-1:Xen43 Wed May 22 06:29:29/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13616 root 20 0 85160 872 604 S 0 0.0 0:00.00 lttng-sessiond : : [SC-1:Xen43 Wed May 22 06:32:34/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13616 root 20 0 190m 7292 688 S 7 0.4 0:05.73 lttng-sessiond : : [SC-1:Xen43 Wed May 22 06:37:19/cluster/temp/tdlt] # top top - 06:37:25 up 1 day, 16:20, 2 users, load average: 2.60, 3.14, 1.55 Tasks: 187 total, 1 running, 186 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.5%sy, 0.0%ni, 99.2%id, 0.1%wa, 0.1%hi, 0.0%si, 0.0%st Mem: 2049828k total, 210928k used, 1838900k free, 2808k buffers Swap: 4200992k total, 784808k used, 3416184k free, 42348k cached : * On a different machine, we observed the following (note the total mem size of this machine is different than the one above): [SC-1:Node16 Tue May 21 13:22:33/cluster/temp/tdlt/Stability/May12] # top top - 13:23:15 up 11 days, 4:41, 2 users, load average: 0.00, 0.01, 0.05 Tasks: 288 total, 1 running, 287 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 24149M total, 3461M used, 20687M free, 372M buffers Swap: 4102M total, 0M used, 4102M free, 2411M cached : [SC-1:Node16 Tue May 21 13:32:09/cluster/temp/tdlt/Stability/May12] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |egrep -i "lttng|trace" PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15012 root 20 0 311m 137m 912 S 0 0.6 2:46.43 lttng-sessiond 15373 root 20 0 82128 1484 956 S 0 0.0 1:06.09 lttng-consumerd 18322 root 20 0 42264 2268 1808 S 0 0.0 1:11.81 TraceEa 18342 root 20 0 42264 912 444 S 0 0.0 0:45.82 TraceEa 20874 root 20 0 190m 3808 2596 S 0 0.0 0:40.07 trace_c 20880 root 20 0 47220 2948 2040 S 0 0.0 0:10.89 trace_p Is problem reproducible ? ========================= * yes How to reproduce (if reproducible): =================================== 1)_ Kill the current sessionD and relaunch a new one 2)_ top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond 3)_ lttng list -u; lttng list #--- no instrumented app running, no session available 4)_ run multiple instances of instrumented app (each instance has only 3 sec lifespan) so that there are around 700 instances present at any time. ex: for a in $(seq 1 1234); do (for n in $(seq 1 100); do (/home/test_apps/TestApp_100perSecOnly 3 np > /dev/null &); done; usleep 200000); done & 5)_ Once in a while, repeat step-2 above. Will see that the number under "%MEM" start increasing for sessionD. 6)_ Stop the process from step-4 . No more instrumented app should be present. 7)_ repeat step-2. %MEM used still remain. Any other information: ====================== -
Files
Actions