Project

General

Profile

Feature #1163

Create an lttng-collectd skeleton for VM tracking feature

Added by Michael Jeanson over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Target version:
-
Start date:
05/15/2018
Due date:
% Done:

0%

Estimated time:

Description

Geneviève Bastien wants to contribute a VM-tracking statedump notifier (based on libudev). However, her current pull request spawns a "statedump" thread in the lttng-sessiond.

Since that thread has to use tracepoints, it is proving hard to integrate the feature inside the session daemon for various reasons:
  • we can't link on liblttng-ust directly since the registration performed at the construction will fail
  • we need a clear way to know when the sessiond is ready to trace itself before we allow the creation of sessions
  • it is not clear that it is the sessiond's role to interact with external programs to collect system information anyway
  • lttng-daemonize() closes all file descriptors, lttng-ust's included...

The solution we chose consists in introducing an external `lttng-collectd` process that can be linked to lttng-ust directly, thus saving us the trouble of dlopen()-ing the lttng-ust provider after the lttng-daemonize().

Right now, we have this hierarchy of processes

$ lttng-sessiond

  bash
  └─── lttng-sessiond

$ lttng-sessiond -d/-b

  bash
  └─── lttng-sessiond, forks and waits for SIGUSR1 from child to exit(), more of a launcher
       └─── lttng-sessiond (the "real" daemonized lttng-sessiond process)

The SIGUSR1 signal is sent from the "real" process to the launcher when all threads have been launched. We consider that the threads have been launched when they have all called sessiond_notify_ready().

The last thread to call sessiond_notify_ready() will be the one that actually sends that signal.

We want the lttng-collectd to live as a child process under the session daemon.

$ lttng-sessiond

  bash
  └─── lttng-sessiond
    └─── lttng-collectd

$ lttng-sessiond -d/-b

  bash
  └─── lttng-sessiond, forks and waits for SIGUSR1 from child to exit(), more of a launcher
       └─── lttng-sessiond (the "real" daemonized lttng-sessiond process)
            └─── lttng-collectd

To make this feature reliable, we need to ensure the sessiond does not allow the creation of tracing sessions before the lttng-collectd has been launched and is ready to react to statedump commands. For now, this means that lttng-collectd's libudev initialization has been completed and that statedump commands initiated by liblttng-ust will result in a correct statedump.

Since lttng-collectd is launched after the registration thread, the start of its process should be delayed by liblttng-ust's constructor until the registration is completed.

To make sure we don't end-up in situations where statedumps are unexpectedly not produced, we should launch lttng-collectd with the environment variable LTTNG_UST_REGISTER_TIMEOUT=-1. Otherwise, the lttng-collectd's registration could timeout.

In practice, that means thread_registration_apps() has to be ready to accept the registration of lttng-collectd before it is launched.

I would add a function here that:

  • waits on a registration_thread_ready semaphore (see explanation below)
  • creates/open a fifo in the rundir (see MKFIFO(3))
  • fork()+execve() the lttng-collectd with the path to the fifo as argument
    • in the parent, block on 1-byte read() on the fifo (this is similar to run-as)
    • in the child, write() a byte on the fifo

Look at how sem_t notification_thread_ready is initialized, posted when the notification-thread is ready, waited-on by the rotation thread and destroyed. The semaphore should be posted here to signal that the app registration thread is ready.

As far as the teardown is concerned, killing/terminating lttng-sessiond should result in SIGPIPE being received by lttng-consumerd. Since SIGPIPE is only received if a process tries to write to a closed pipe, lttng-collectd should just loop on write(). Everything we want it to do will happen in the lttng-ust thread anyway.

History

#1

Updated by Michael Jeanson over 1 year ago

Here is a branch with a PoC: https://github.com/mjeanson/lttng-tools/tree/collectd

There is still some work and cleanup to do but it is working.

Also available in: Atom PDF