Project

General

Profile

Actions

Bug #411

closed

health check reporting should use TLS

Added by Mathieu Desnoyers over 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
Start date:
12/13/2012
Due date:
% Done:

100%

Estimated time:

Description

In order to be able to put health check reporting into code shared between threads, we should use TLS variables to store the status (counters), and register those memory areas through a "register" method called at beginning of thread life, and unregister method called at unregister.

As a side-benefit, this will ensure that a thread never updates the health check counter of another thread by mistake, and therefore removes a risk of programming error that would prevent detection of health state of some threads.


Related issues 1 (0 open1 closed)

Related to LTTng-tools - Bug #428: SessionD occasionally returns error for lttng_health_check (LTTNG_HEALTH_CMD)ResolvedDavid Goulet01/23/2013

Actions
Actions #1

Updated by David Goulet over 11 years ago

  • Target version set to 2.2
Actions #2

Updated by Mathieu Desnoyers about 11 years ago

I see two possible models for this:

1) linked-list of per-TLS nodes, similar to what is done in the userspace RCU library for per-thread tracking of grace periods. The state tracking of a thread would be within the TLS of that thread, along with the "task type" of the thread (an enum provided as parameter to registration). Health check "poke" actions would need to iterate on the linked list to find each node it is querying for. Note: eventually, if iteration on linked list becomes an issue, we could put the nodes in a hash table indexed by "task type".

2) registration could return an index within a global array. This index would then be stored within the TLS, and used to access the appropriate entry within the global storage each time we need to update the state for a thread.

The advantage of (1) over (2) is that we can extend (1) to thread pools without requiring a hard limit on the size of the global storage needed for (2) (or dynamic memory reallocation). Also, (1) requires less levels of indirection on the fast-path than (2). (fast-path being the health reporting actions executed by each thread).

Actions #3

Updated by Mathieu Desnoyers about 11 years ago

With approach (1), you'll need a mutex protecting registration/unregistration from list traversal (unless list traversal is RCU, but I don't recommend going for it in the first implementation).

Each (frequent) access to update the state of per-thread health can be performed with an atomic operation, without need for any mutex.

Reading the state of each thread can be performed with an atomic read, while holding the list-protection mutex.

Actions #4

Updated by David Goulet about 11 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100
Actions #5

Updated by Mathieu Desnoyers about 11 years ago

For tracking purpose, I have been actively involved in development of the fix (commit 769b7d7ec9067a192a01a8d0c884256e9fa25165) and following cleanup (commit 227e824a28deb5b5d31955908827426a03f97802) and carefully reviewed both commits before they were merged.

Reviewed-by: Mathieu Desnoyers <>

Actions #6

Updated by Christian Babeux about 11 years ago

Reviewed-by: Christian Babeux <>

Actions #7

Updated by Christian Babeux about 11 years ago

  • Subject changed from heath check reporting should use TLS to health check reporting should use TLS
Actions

Also available in: Atom PDF