LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912018-05-15T22:54:56ZLTTng bugs repository
Redmine LTTng-tools - Feature #1163 (New): Create an lttng-collectd skeleton for VM tracking featurehttps://bugs.lttng.org/issues/11632018-05-15T22:54:56ZMichael Jeansonmjeanson@efficios.com
<p>Geneviève Bastien wants to contribute a VM-tracking statedump notifier (based on libudev). However, her current pull request spawns a "statedump" thread in the lttng-sessiond.</p>
Since that thread has to use tracepoints, it is proving hard to integrate the feature inside the session daemon for various reasons:
<ul>
<li>we can't link on liblttng-ust directly since the registration performed at the construction will fail</li>
<li>we need a clear way to know when the sessiond is ready to trace itself before we allow the creation of sessions</li>
<li>it is not clear that it is the sessiond's role to interact with external programs to collect system information anyway</li>
<li><code>lttng-daemonize()</code> closes all file descriptors, lttng-ust's included...</li>
</ul>
<p>The solution we chose consists in introducing an external `lttng-collectd` process that can be linked to lttng-ust directly, thus saving us the trouble of <code>dlopen()</code>-ing the lttng-ust provider after the <code>lttng-daemonize()</code>.</p>
<p>Right now, we have this hierarchy of processes</p>
<pre>
$ lttng-sessiond
bash
└─── lttng-sessiond
$ lttng-sessiond -d/-b
bash
└─── lttng-sessiond, forks and waits for SIGUSR1 from child to exit(), more of a launcher
└─── lttng-sessiond (the &quot;real&quot; daemonized lttng-sessiond process)
</pre>
<p>The <code>SIGUSR1</code> signal is sent from the "real" process to the launcher when all threads have been launched. We consider that the threads have been launched when they have all called <code>sessiond_notify_ready()</code>.</p>
<p>The last thread to call <code>sessiond_notify_ready()</code> will be the one that actually sends that signal.</p>
<p>We want the <code>lttng-collectd</code> to live as a child process under the session daemon.</p>
<pre>
$ lttng-sessiond
bash
└─── lttng-sessiond
└─── lttng-collectd
$ lttng-sessiond -d/-b
bash
└─── lttng-sessiond, forks and waits for SIGUSR1 from child to exit(), more of a launcher
└─── lttng-sessiond (the &quot;real&quot; daemonized lttng-sessiond process)
└─── lttng-collectd
</pre>
<p>To make this feature reliable, we need to ensure the sessiond does not allow the creation of tracing sessions before the <code>lttng-collectd</code> has been launched <em>and</em> is ready to react to statedump commands. For now, this means that <code>lttng-collectd</code>'s <code>libudev</code> initialization has been completed and that statedump commands initiated by liblttng-ust will result in a correct statedump.</p>
<p>Since <code>lttng-collectd</code> is launched after the registration thread, the start of its process should be delayed by <code>liblttng-ust</code>'s constructor until the registration is completed.</p>
<p>To make sure we don't end-up in situations where statedumps are unexpectedly not produced, we should launch <code>lttng-collectd</code> with the environment variable <code>LTTNG_UST_REGISTER_TIMEOUT=-1</code>. Otherwise, the <code>lttng-collectd</code>'s registration could timeout.</p>
<p>In practice, that means <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L2041" class="external"><code>thread_registration_apps()</code></a> has to be ready to accept the registration of <code>lttng-collectd</code> before it is launched.</p>
<p>I would add a function <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L6292" class="external">here</a> that:</p>
<ul>
<li>waits on a registration_thread_ready semaphore (see explanation below)</li>
<li>creates/open a fifo in the rundir (see <code>MKFIFO(3)</code>)</li>
<li>fork()+execve() the <code>lttng-collectd</code> with the path to the fifo as argument
<ul>
<li>in the parent, block on 1-byte <code>read()</code> on the fifo (this is similar to run-as)</li>
<li>in the child, <code>write()</code> a byte on the fifo</li>
</ul></li>
</ul>
<p>Look at how <code>sem_t notification_thread_ready</code> is <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L6146" class="external">initialized</a>, <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/notification-thread.c#L437" class="external">posted when the notification-thread is ready</a>, <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/rotation-thread.c#L298" class="external">waited-on by the rotation thread</a> and <a href="https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L6381" class="external">destroyed</a>. The semaphore should be posted <a href="https://github.com/lttng/lttng-tools/blob/5b0e3ccb033e701a4c4005d6859757652ca8897c/src/bin/lttng-sessiond/main.c#L2087" class="external">here</a> to signal that the app registration thread is ready.</p>
<p>As far as the teardown is concerned, killing/terminating <code>lttng-sessiond</code> should result in <code>SIGPIPE</code> being received by <code>lttng-consumerd</code>. Since <code>SIGPIPE</code> is only received if a process tries to write to a closed pipe, <code>lttng-collectd</code> should just loop on <code>write()</code>. Everything we want it to do will happen in the lttng-ust thread anyway.</p> LTTng-UST - Feature #327 (On pause): Implement missing hostname contexthttps://bugs.lttng.org/issues/3272012-08-26T23:22:47ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>To match features of lttng-modules.</p>