Bug #567
closedlttng-tools 2.2.0-rc2: enable-channel failure leaves session in suspicious state
100%
Description
The session:
$ sudo -H lttng list pid Tracing session pid: [active] Trace path: net://131.132.32.77:5342/pid-20130620-161857 [data: 5343] === Domain: UST global === Buffer type: per PID Channels: ------------- - canalpid: [enabled] Attributes: overwrite mode: 0 subbufers size: 4096 number of subbufers: 4 switch timer interval: 0 read timer interval: 0 output: mmap() Events: * (type: tracepoint) [enabled] $ sudo -H lttng enable-channel k1 -k --discard --subbuf-size 1024 --num-subbuf 8 PERROR [20531/20631]: ioctl kernel create channel: Invalid argument (in kernel_create_channel() at kernel.c:144) Error: Channel k1: Kernel create channel failed (session pid) Warning: Some command(s) went wrong $ sudo -H lttng list pid Tracing session pid: [active] Trace path: net://131.132.32.77:5342/pid-20130620-161857 [data: 5343] === Domain: Kernel === Warning: No kernel channel === Domain: UST global === Buffer type: per PID Channels: ------------- - canalpid: [enabled] Attributes: overwrite mode: 0 subbufers size: 4096 number of subbufers: 4 switch timer interval: 0 read timer interval: 0 output: mmap() Events: * (type: tracepoint) [enabled]
It is highly suspicious that the session's listing should gain this after a failed enable-channel
command:
=== Domain: Kernel === Warning: No kernel channel
The enable-channel
error is the same for an inactive session. I also note that the failed enable-channel
caused the kernel lttng-consumerd
daemon to start. Also, the next command ends with an error:
$ sudo -H lttng stop Error: Stopping kernel trace failed
Despite the "error", the trace was nevertheless stopped. However, the user does not get the expected "Tracing stopped for session pid" message. Maybe the error should be downgraded to a warning?
Updated by David Goulet over 11 years ago
I'm not sure what's the issue here.
The enable channel fails because of the subbuf size that is too small. When listing, you see no kernel channel which is expected. Is it the "Warning" that is bugging you? Note that I would like to change that message output but for now we are in RC stage so the output format can't be changed. For 2.3, it is something we can think of!
The consumer is spawn the first time a kernel command is encountered even if it fails. This behavior is unlikely to change.
For the fail stop, did you start the session before ?
Updated by Daniel U. Thibault over 11 years ago
-----Message d'origine-----
Issue #567 has been updated by David Goulet.
I'm not sure what's the issue here.
The enable channel fails because of the subbuf size that is too small. When listing, you see no kernel channel which is expected. Is it the "Warning" that is bugging you? Note that I would like to change that message output but for now we are in RC stage so the output format can't be changed. For 2.3, it is something we can think of!
The consumer is spawned the first time a kernel command is encountered even if it fails. This behaviour is unlikely to change.
For the fail stop, did you start the session before ?
----------------------------------------
The nature of the failure is irrelevant (there are many ways to "achieve" a failure). As the first lttng list command shows, the trace was running when the enable-channel commands were being issued. I'm not worried about the consumer daemon starting up, that's par for the course.
The issue is two-fold:
1) The lttng stop command reports an error and does not state that the session was stopped, although it was indeed stopped successfully. Downgrading the error to a warning would probably fix this (I assume the confirmation message is suppressed by the error and wouldn't be by a warning). On the other hand, if a domain has no channels, there is nothing to stop, so the tracing stop command should internally report success, not failure. (Could it be the kernel part of the session is considered "already stopped"?)
2) The display of the session status is changed by the failed channel enable. Presumably this is because the kernel domain was added to the session description as part of the preparations for channel creation. It would make more sense for the channel creation error to force a rollback of the domain creation (I presume it's not possible to create the channel and only add the domain to the session once this is successful).
Daniel U. Thibault
Protection des systèmes et contremesures (PSC) | Systems Protection & Countermeasures (SPC)
Cyber sécurité pour les missions essentielles (CME) | Mission Critical Cyber Security (MCCS)
R & D pour la défense Canada - Valcartier (RDDC Valcartier) | Defence R&D Canada - Valcartier (DRDC Valcartier)
2459 route de la Bravoure
Québec QC G3J 1X5
CANADA
Vox : (418) 844-4000 x4245
Fax : (418) 844-4538
NAC : 918V QSDJ <http://www.travelgis.com/map.asp?addr=918V%20QSDJ>
Gouvernement du Canada | Government of Canada
<http://www.valcartier.drdc-rddc.gc.ca/>
Updated by David Goulet over 11 years ago
- Status changed from New to Confirmed
- Assignee set to David Goulet
- Target version set to 2.2
Oh I see the issue now!
Yes, that's a problem with the internal state of the session daemon. The first enable kernel channel triggered a kernel session object creation (internal) but was not destroyed upon the failure of the enable thus making the stop command trying to stop that kernel session.
I'll push a simple patch that does not try to stop a kernel/ust session if there was no previous start. That fixes here the problem of having an error on stop command.
As for the rollback of the created internal domain session on error, I'll keep that for 2.3+ since it's not release critical for now and would requires quite of work for now. I'll open a Feature request for that.
Updated by David Goulet over 11 years ago
- Status changed from Confirmed to Resolved
- % Done changed from 0 to 100
Applied in changeset d39829a1f4d1e656561e05bcd9c4d43d9e9a5579.