Project

General

Profile

Actions

Bug #374

closed

Streaming: when using non-default ports, port remain hanging in CLOSE_WAIT state

Added by Tan le tran over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
10/15/2012
Due date:
% Done:

100%

Estimated time:

Description


Description: 
============ 
When using non-default ports for streaming, the ports remain hanging in CLOSE_WAIT state forever even after "lttng destroy" and the remote side has kill the corresponding relayd process.

Commit used: 
============ 
userspace  : (oct12) 1fe734e wfcqueue: clarify locking usage
lttng-ust  : (oct09) 1c7b4a9 Fix: memcpy of string is larger than source
ttng-tools: (oct02)  4dbc372 ABI with support for compat 32/64 bits
babeltrace : (oct02) 052606b Document that list.h is LGPLv2.1, but entirely trivial

Scenario: 
========= 
  On Node-1:
    1a)_ netstat -etan
    1b)_ lttng-relayd -vvv -C tcp://0.0.0.0:51234 -D tcp://0.0.0.0:51235  &
    1c)_ Once the session has been started on Node-2:
         netstat -etan gives:
         Proto Recv-Q Send-Q Local Address     Foreign Address          State       User Inode
         tcp        0      0 0.0.0.0:51234      0.0.0.0:*               LISTEN      0    1625723
         tcp        0      0 0.0.0.0:51235      0.0.0.0:*               LISTEN      0    1625724
         tcp        0      0 192.168.0.1:51235  192.168.0.4:58137       ESTABLISHED 0    1625875
         tcp        0      0 192.168.0.1:51235  192.168.0.4:58139       ESTABLISHED 0    1625975
         tcp        0      0 192.168.0.1:51234  192.168.0.4:41174       ESTABLISHED 0    1625974
         tcp        0      0 192.168.0.1:51234  192.168.0.4:41172       ESTABLISHED 0    1625874

    1d)_ Once the session has been destroyed from Node-2,
         pkill lttng-relayd
         netstat -etan gives:
         Proto Recv-Q Send-Q Local Address     Foreign Address          State       User Inode
         tcp        0      0 192.168.0.1:51235  192.168.0.4:58137       FIN_WAIT2   0    1625875
         tcp        0      0 192.168.0.1:51235  192.168.0.4:58139       FIN_WAIT2   0    1625975
         tcp        0      0 192.168.0.1:51234  192.168.0.4:41174       FIN_WAIT2   0    1625974
         tcp        0      0 192.168.0.1:51234  192.168.0.4:41172       FIN_WAIT2   0    1625874

    1e)_ About 30 sec later:
         The above ports pair nolonger appear in netstat

  On Node-2:
    2a)_ <run an instrumented app>
    2b)_ netstat -etan
    2c)_ lttng create ses1
    2d)_ lttng enable-event an_event_from_the_running_instrumented_app -u
    2e)_ lttng enable-consumer -u -s ses1 -C tcp://192.168.0.1:51234 -D tcp://192.168.0.1:51235 -e
    2f)_ lttng start; sleep 20; lttng stop
    2g)_ netstat -etan
         Proto Recv-Q Send-Q Local Address     Foreign Address    State       User Inode
         tcp        0      0 192.168.0.4:1128  0.0.0.0:*          LISTEN      0    22611
         tcp        0      0 192.168.0.4:1130  0.0.0.0:*          LISTEN      0    20916
         tcp        0      0 192.168.0.4:1131  0.0.0.0:*          LISTEN      0    22141
         tcp        0      0 0.0.0.0:111       0.0.0.0:*          LISTEN      0    20739
         tcp        0      0 0.0.0.0:22        0.0.0.0:*          LISTEN      0    22237
         tcp        0      0 192.168.0.4:1022  0.0.0.0:*          LISTEN      0    20952
         tcp        0      0 192.168.0.4:41172 192.168.0.1:51234  ESTABLISHED 0    1474170
         tcp        0      0 192.168.0.4:58139 192.168.0.1:51235  ESTABLISHED 0    1477457
         tcp        0      0 192.168.0.4:41174 192.168.0.1:51234  ESTABLISHED 0    1477456
         tcp        0      0 192.168.0.4:58137 192.168.0.1:51235  ESTABLISHED 0    1477398

    2h)_ lttng destroy; lttng list 
    2i)_ From Node-1, kill the corresponding relayd .

    2j)_ netstat -etan
         Proto Recv-Q Send-Q Local Address     Foreign Address    State       User Inode
         tcp        0      0 192.168.0.4:1128  0.0.0.0:*          LISTEN      0    22611
         tcp        0      0 192.168.0.4:1130  0.0.0.0:*          LISTEN      0    20916
         tcp        0      0 192.168.0.4:1131  0.0.0.0:*          LISTEN      0    22141
         tcp        0      0 0.0.0.0:111       0.0.0.0:*          LISTEN      0    20739
         tcp        0      0 0.0.0.0:22        0.0.0.0:*          LISTEN      0    22237
         tcp        0      0 192.168.0.4:1022  0.0.0.0:*          LISTEN      0    20952
         tcp        0      0 192.168.0.4:41172 192.168.0.1:51234  CLOSE_WAIT  0    1474170
         tcp        0      0 192.168.0.4:58139 192.168.0.1:51235  CLOSE_WAIT  0    1477457
         tcp        0      0 192.168.0.4:41174 192.168.0.1:51234  CLOSE_WAIT  0    1477456
         tcp        0      0 192.168.0.4:58137 192.168.0.1:51235  CLOSE_WAIT  0    1477398

    2k)_ The above port-pair remain CLOSE_WAIT forever (still there after 24hrs).
         When a new session is created using the same -C and -D, 
         new set of port-pair being created and they eventually hang in CLOSE_WAIT as well.

Actions #1

Updated by David Goulet over 11 years ago

  • Status changed from New to Confirmed
  • Assignee set to David Goulet
  • Target version set to 2.1 stable
Actions #2

Updated by David Goulet over 11 years ago

Is this a show stopper for you right now?

If so, I can work on a quick patch to fix that today. Else, I'll try to fix it this week.

Thanks!

Actions #3

Updated by Tan le tran over 11 years ago

Hi David,

Thank You for looking into this issue.

We prefer to have a permanent fix. It is ok if we have the fix later this week.
Currently we use node reboot as the remedy to clear all the hanging ports.

Regards,
Tan

Actions #4

Updated by David Goulet over 11 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF