https://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912013-05-22T17:25:46ZLTTng bugs repositoryLTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14302013-05-22T17:25:46ZJérémie Galarneaujeremie.galarneau@efficios.com
<ul></ul><p>We're trying to reproduce the issue on our end.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14372013-05-27T11:31:10ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><p>Hi Jérémie,</p>
<p>Did you have a chance to looks into this issue ?<br />Thanks,<br />Tan</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14422013-05-27T14:18:56ZJérémie Galarneaujeremie.galarneau@efficios.com
<ul></ul><p>David has tried to reproduce the issue without success... However, we have identified another bug that may be related. We are still looking into it.<br />Could you run the sessiond under valgrind in "--leak-check=full" mode?</p>
<p>David has rebased a Valgrind-support patch on the current lttng-tools HEAD. It's available on his personal repository.</p>
<p><a class="email" href="mailto:git@github.com">git@github.com</a>:dgoulet/lttng-tools-dev.git<br />branch: valgrind</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14432013-05-27T19:31:26ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><p>Hi Jeremie,</p>
<p>Unfortunately, I can not reproduce this with Valgrind. The reason is that when sessionD is running with Valgrind, it is too slow to reponse to the registration of the instrumented app causing too many "Error: Timed out waiting for lttng-sessiond", which in turns, causes the instrumented app to hang (ie: they can not execute their main even after the LTTNG_UST_REGISTER_TIMEOUT has elapsed). This new behaviour has been reported in bug-545.</p>
<p>I will try to rerun with a lower launch rate (ie: launch less number of instances of intr app per sec) and see<br />if I can see the increase of MEM usage from the "top" printout.<br />Regards,<br />Tan</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14442013-05-27T19:55:00ZJérémie Galarneaujeremie.galarneau@efficios.com
<ul></ul><p>Okay. Reproducing it under a lighter load will absolutely make it easier to debug... Keep me posted.<br />Thanks for reporting <a class="issue tracker-1 status-5 priority-6 priority-highest closed" title="Bug: lttng2.2.0rc2: Instrumented apps hung (can not execute their main) when sessionD is not responsi... (Resolved)" href="https://bugs.lttng.org/issues/545">#545</a> too!</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14452013-05-28T11:37:18ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><pre>
When trying with lower rate, problem did not happen.
ex:
for a in $(seq 1 28800); do (for n in $(seq 1 100); do (/home/test_apps/TestApp_100perSecOnly 3 np > /dev/null &);\
done; usleep 800000); done &
I have ran the above for 8hrs, each 0.8sec, launch 100 instances. The "top" printout before and after
the 8 hour run are pretty much the same (from RES and MEM column):
[SC-1:Node16 Mon May 27 15:40:05/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
712 root 20 0 157m 6892 664 S 2 0.0 0:00.28 lttng-sessiond
:
[SC-1:Node16 Tue May 28 06:51:22/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
712 root 20 0 236m 8720 708 S 0 0.0 7:08.17 lttng-sessiond
When changing the rate to 100 instances per 0.2 sec, I see the value under RES increase to 12MB and stay there
even after no more apps being launched (NOTE: there is no session defined, no session active):
ex:
for a in $(seq 1 28800); do (for n in $(seq 1 100); do (/home/test_apps/TestApp_100perSecOnly 3 np > /dev/null &); \
done; usleep 200000); done &
<before launching apps>
[SC-1:Node16 Tue May 28 06:51:22/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
712 root 20 0 236m 8720 708 S 0 0.0 7:08.17 lttng-sessiond
:
:
<after launching apps>
[SC-1:Node16 Tue May 28 06:52:45/cluster/temp/tdlt] # top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 |grep -i sessiond
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
712 root 20 0 235m 12m 708 S 7 0.1 7:08.52 lttng-sessiond
</pre> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14472013-05-28T13:33:45ZDavid Goulet
<ul></ul><p>That's interesting. I'm betting on an error path that does not clean up memory when some limit is reached.</p>
<p>During those tests with a large number of applications, do you know if any errors occur like open file limit reached for instance ?</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14492013-05-28T13:52:43ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><p>No, I did not observe any error while doing the test.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14812013-06-03T14:20:01ZDavid Goulet
<ul><li><strong>Target version</strong> set to <i>2.2</i></li></ul> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14882013-06-04T20:01:26ZDavid Goulet
<ul></ul><p>We've been running this use case (and variations) on a 16 cores machine for more than 50 minutes and we are unable to reproduce it.</p>
<p>So, can you confirm me that during this test there is NO session(s) created and the applications just register and unregister? Also, are you running health check during that time also.</p>
<p>I'm trying to find any "variables" that can change the state of the session daemon.</p>
<p>If none of the above apply, we'll have to come on site to help you guys. We have a small library that keeps track of the malloc/free and their call sites.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14902013-06-05T12:18:22ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><pre>
We confirm that during this test (as described earlier in this bug-report):
_ Our TraceEA process performs the following at every 5 sec interval:
1)_ Call the following APIs:
lttng_health_check(LTTNG_HEALTH_CMD)
lttng_health_check(LTTNG_HEALTH_APP_MANAGE)
lttng_health_check(LTTNG_HEALTH_APP_REG)
lttng_health_check(LTTNG_HEALTH_CONSUMER)
2)_ Create a dummy session via the following API calls:
lttng_create_session()
lttng_destroy_session()
(NOTE: no channel or event being enabled)
_ However, when we turn off the above TraceEA process, the same thing
is still observed (ie: after a few minutes run, we stop launching
the apps, the top or ps command printout indicate that the resident
memory for sessiond has increased by around 20 times its original
size.)
_ Other than the above, only a bunch of instrumented application being
launched and terminates. No real session (with enabled events) being
created and started.
</pre> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14942013-06-06T14:43:56ZDavid Goulet
<ul></ul><p>Is this can be reproduce with the session daemon in -vvv ?</p>
<p>Actually, only "-v" will be of great help. If not, I'll come on site to help you hunt this one.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14962013-06-06T19:50:02ZDavid Goulet
<ul><li><strong>File</strong> <a href="/attachments/231">bug539.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/231/bug539.diff">bug539.diff</a> added</li><li><strong>Status</strong> changed from <i>New</i> to <i>Confirmed</i></li></ul><p>I'm pretty confident that this will fix the memory leak.</p>
<p>You'll have to apply this on master HEAD at least at commit cfa9a5a2b4a96e0d6a9eeddd2622a6d7c173b7ac, I just pushed a fix that is needed for this patch to work.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14982013-06-10T11:42:36ZTan le trantan.dung.le.tran@ericsson.com
<ul><li><strong>File</strong> <a href="/attachments/234">Jun10_retest.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/234/Jun10_retest.log">Jun10_retest.log</a> added</li></ul><pre>
Hi David,
We have loaded the latest lttng-tools and run the test. We still see the value under "RSS" column increases for lttng-sessiond .
Commit used:
============
tools : cfa9a5a (HEAD, origin/master, origin/HEAD) Fix: poll compat layer .....
ust : 844a160 (HEAD, origin/stable-2.2) Fix: liblttng-ust process startup hang....
userspace : 264716f (HEAD, origin/stable-0.7, stable-0.7) Fix: Use a filled signal ....
The following value under RSS column from command "ps auxwwwf":
RSS Description
--- -----------
912 - After "pkill sessiond" and a brand new sessiond is launched
7272 - After:
for a in $(seq 1 40); do (for n in $(seq 1 100); do (/home/test_apps/TestApp_100perSecOnly 3 np > /dev/null &); done;
110940 - After running a case (described below) where session activation is involed:
1)_ Start launching TestApp as follow to give an average of 1000 TestApp
present at any giving time.
for a in $(seq 1 100);do (date; ps -ef |grep -c TestApp; usleep 500000; \
for n in $(seq 1 300);do \
(LTTNG_UST_REGISTER_TIMEOUT=100 /home/test_apps/TestApp_100perSecOnly 3 np > /dev/null 2>&1 & );\
done); \
done
3)_ Sleep for 5 sec to have the average apps launched and terminated per sec
4)_ Create multiple sessions and trace on HundredPerSec filtering on "OnePs < 10"
(this should satisfies all PID)
5)_ Activate only ONE of the above sessions
6)_ Wait for 10 sec
7)_ deactivate the session.
8)_ Collect processor load if possible.
9)_ Stop then convert the session
Can we use the value under RSS column (from ps command) as indication of memory leak or we have mis-interpreted that value ?
</pre> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=14992013-06-10T14:36:02ZDavid Goulet
<ul></ul><p>Well yes RSS is the resident size of the program. I can see in the log that it goes up to 100Mb... The head commit of tools does not show the fix I've provided (commit: adbbaabc265e354b7986b48815ee556f1a07769b). Is this because you might have forget it or you just applied the diff ?</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=15002013-06-10T14:42:01ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><p>Ouch !</p>
<p>My Bad....my verrrry bad !</p>
<p>I thought I just need the latest commit ..... didn't see the patch :-(<br />Applying the patch now ...will update you soon .</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=15012013-06-10T14:50:07ZDavid Goulet
<ul></ul><p>No problem. :)</p>
<p>If it is still visible after using the patch, I'll come on site hunt down this thing!</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=15022013-06-10T15:44:32ZTan le trantan.dung.le.tran@ericsson.com
<ul><li><strong>File</strong> <a href="/attachments/235">patch539_jun10_gitdiff.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/235/patch539_jun10_gitdiff.log">patch539_jun10_gitdiff.log</a> added</li><li><strong>File</strong> <a href="/attachments/236">patch539_jun10_terminal.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/236/patch539_jun10_terminal.log">patch539_jun10_terminal.log</a> added</li></ul><p>Hi David,</p>
<p>I have applied the patch (from update#13) on top of commit cfa9a5a .<br />Unfortunately, we still observe that RSS increases after the test .</p>
<p>Please, see attached log for further detail.<br />Regards,<br />Tan</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=16702013-08-07T13:41:22ZTan le trantan.dung.le.tran@ericsson.com
<ul></ul><p>After loading lttng-tools v2.2.3 and lttng-ust 2.2.1, the above mem leak is no longer observed.<br />Many thanks for the great support.</p> LTTng-tools - Bug #539: lttng-tools2.2.0rc2: Possible memory leak from sessionD ?https://bugs.lttng.org/issues/539?journal_id=16742013-08-08T09:13:57ZDavid Goulet
<ul><li><strong>Status</strong> changed from <i>Confirmed</i> to <i>Resolved</i></li></ul>