Project

General

Profile

Actions

Bug #567

closed

lttng-tools 2.2.0-rc2: enable-channel failure leaves session in suspicious state

Added by Daniel U. Thibault over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
06/25/2013
Due date:
% Done:

100%

Estimated time:

Description

The session:

$ sudo -H lttng list pid
Tracing session pid: [active]
    Trace path: net://131.132.32.77:5342/pid-20130620-161857 [data: 5343]

=== Domain: UST global ===

Buffer type: per PID

Channels:
-------------
- canalpid: [enabled]

    Attributes:
      overwrite mode: 0
      subbufers size: 4096
      number of subbufers: 4
      switch timer interval: 0
      read timer interval: 0
      output: mmap()

    Events:
      * (type: tracepoint) [enabled]

$ sudo -H lttng enable-channel k1 -k --discard --subbuf-size 1024 --num-subbuf 8
PERROR [20531/20631]: ioctl kernel create channel: Invalid argument (in kernel_create_channel() at kernel.c:144)
Error: Channel k1: Kernel create channel failed (session pid)
Warning: Some command(s) went wrong
$ sudo -H lttng list pid
Tracing session pid: [active]
    Trace path: net://131.132.32.77:5342/pid-20130620-161857 [data: 5343]

=== Domain: Kernel ===

Warning: No kernel channel
=== Domain: UST global ===

Buffer type: per PID

Channels:
-------------
- canalpid: [enabled]

    Attributes:
      overwrite mode: 0
      subbufers size: 4096
      number of subbufers: 4
      switch timer interval: 0
      read timer interval: 0
      output: mmap()

    Events:
      * (type: tracepoint) [enabled]

It is highly suspicious that the session's listing should gain this after a failed enable-channel command:

=== Domain: Kernel ===

Warning: No kernel channel

The enable-channel error is the same for an inactive session. I also note that the failed enable-channel caused the kernel lttng-consumerd daemon to start. Also, the next command ends with an error:

$ sudo -H lttng stop
Error: Stopping kernel trace failed

Despite the "error", the trace was nevertheless stopped. However, the user does not get the expected "Tracing stopped for session pid" message. Maybe the error should be downgraded to a warning?


Related issues 1 (1 open0 closed)

Related to LTTng-tools - Feature #573: Destroy an internal domain session on command error if created in that code flowConfirmed06/25/2013

Actions
Actions #1

Updated by David Goulet over 11 years ago

I'm not sure what's the issue here.

The enable channel fails because of the subbuf size that is too small. When listing, you see no kernel channel which is expected. Is it the "Warning" that is bugging you? Note that I would like to change that message output but for now we are in RC stage so the output format can't be changed. For 2.3, it is something we can think of!

The consumer is spawn the first time a kernel command is encountered even if it fails. This behavior is unlikely to change.

For the fail stop, did you start the session before ?

Actions #2

Updated by Daniel U. Thibault over 11 years ago

-----Message d'origine-----
Issue #567 has been updated by David Goulet.

I'm not sure what's the issue here.

The enable channel fails because of the subbuf size that is too small. When listing, you see no kernel channel which is expected. Is it the "Warning" that is bugging you? Note that I would like to change that message output but for now we are in RC stage so the output format can't be changed. For 2.3, it is something we can think of!

The consumer is spawned the first time a kernel command is encountered even if it fails. This behaviour is unlikely to change.

For the fail stop, did you start the session before ?
----------------------------------------

The nature of the failure is irrelevant (there are many ways to "achieve" a failure).  As the first lttng list command shows, the trace was running when the enable-channel commands were being issued.  I'm not worried about the consumer daemon starting up, that's par for the course.
The issue is two-fold:

1) The lttng stop command reports an error and does not state that the session was stopped, although it was indeed stopped successfully. Downgrading the error to a warning would probably fix this (I assume the confirmation message is suppressed by the error and wouldn't be by a warning). On the other hand, if a domain has no channels, there is nothing to stop, so the tracing stop command should internally report success, not failure. (Could it be the kernel part of the session is considered "already stopped"?)

2) The display of the session status is changed by the failed channel enable. Presumably this is because the kernel domain was added to the session description as part of the preparations for channel creation. It would make more sense for the channel creation error to force a rollback of the domain creation (I presume it's not possible to create the channel and only add the domain to the session once this is successful).

Daniel U. Thibault
Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
2459 route de la Bravoure
Québec QC G3J 1X5
CANADA
Vox : (418) 844-4000 x4245
Fax : (418) 844-4538
NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
Gouvernement du Canada | Government of Canada
<http://www.valcartier.drdc-rddc.gc.ca/>

Actions #3

Updated by David Goulet over 11 years ago

  • Status changed from New to Confirmed
  • Assignee set to David Goulet
  • Target version set to 2.2

Oh I see the issue now!

Yes, that's a problem with the internal state of the session daemon. The first enable kernel channel triggered a kernel session object creation (internal) but was not destroyed upon the failure of the enable thus making the stop command trying to stop that kernel session.

I'll push a simple patch that does not try to stop a kernel/ust session if there was no previous start. That fixes here the problem of having an error on stop command.

As for the rollback of the created internal domain session on error, I'll keep that for 2.3+ since it's not release critical for now and would requires quite of work for now. I'll open a Feature request for that.

Actions #4

Updated by David Goulet over 11 years ago

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF