Project

General

Profile

Bug #7

Double PID registering and unregistering race

Added by David Goulet over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
02/09/2012
Due date:
% Done:

100%

Estimated time:

Description

With strace and Python using our libc wrapper, somehow python registers once to the session daemon. Sometime after, a second register comes in with the same PID and than follows an unregister. This creates an assert() failure for our lttng_ht_add_unique call at the second registration since the same PID is used.

You can recreate this behaviour using the lttng-tools commit 82541c3400f9568835938b7c2c6ce5e18b5817c0 and lttng-ust commit d8de13549b80d40b0c823e43e81afd55266f2fe5. Having the libc wrapper installed. Python behaviour with this particular script (reproducible with gwibber-service also), is to load all *-libc.so found in ldconfig -p hence loading our library automatically. A bug has been report to the python dev. folks.

Running:

strace /usr/bin/python /usr/lib/desktopcouch/desktopcouch-service > /tmp/output.txt 2>&1

will hang the process. On Ctrl+C, you'll hit the issue (assert).

The "real" viable solution is planned for 2.1 or later stable release.

We have to remove the ust_app_sock_key_map and change it with a hash table containing applications having the FD has key. Each ust_app structure will have a node pointer to a hash table indexed by PID and FD. So, when having double PID registering, we'll use the direct lookup per PID, use add_replace in the hash table and clean up the old node. This prevent the PID-fd lookup race when the unregister happens just after the replace and before the close(fd).

We'll have to add the lttng_ht_add_replace function to the lttng_ht internal library.

For the 2.0 stable release, we will simply remove the assert from the add_unique, cleanup the old node (free() and close(fd) and go on. This is valid since we are still with a single thread handling registration and unregistration (sock lookup and close sock).

#1

Updated by David Goulet over 8 years ago

We are actually not able to provide a quick fix for the 2.0 stable release since we are hitting a possible race between the close(fd) when unregistering and the close(fd) in the call_rcu thread (when cleaning up the old node).

I'll proceed with the pre20 release since this bug is a big corner case that does not stop the train :).

We'll try to push a fix during the 2.0 release candidate period.

#2

Updated by David Goulet over 8 years ago

  • Assignee set to David Goulet
#3

Updated by Anonymous over 8 years ago

  • Target version set to 2.0 stable
#4

Updated by David Goulet over 8 years ago

  • Status changed from New to In Progress

Python has fixed the libc.so auto loading issue. See issue13979 at bugs.python.org.

#5

Updated by David Goulet over 8 years ago

  • Status changed from In Progress to Confirmed
#6

Updated by Yannick Brosseau about 8 years ago

  • Priority changed from Normal to High
#7

Updated by David Goulet about 8 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF