Project

General

Profile

Actions

Bug #1315

open

Kernel panics after `pkill lttng`; root session daemon has active triggers

Added by Philippe Proulx over 3 years ago. Updated over 3 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
Francis Deslauriers
Target version:
-
Start date:
05/07/2021
Due date:
% Done:

0%

Estimated time:

Description

I don't know how to reproduce this bug.

I played with the lttng add-trigger command while writing usage examples for its manual page.

I started the root session daemon as such:

# lttng-sessiond --daemonize --group=eepp

I then ran commands such as:

$ lttng add-trigger --name user --condition=event-rule-matches --domain=user --action=notify
$ lttng add-trigger --condition=event-rule-matches \
                    --domain=user --action=notify \
                    --rate-policy=every:10
$ lttng add-trigger --owner-uid=33 \
                    --condition=event-rule-matches \
                    --domain=kernel --name='sched*' \
                    --action=notify
$ lttng add-trigger --condition=event-rule-matches \
                    --domain=kernel --type=syscall \
                    --filter='fd < 3' \
                    --action=start-session my-session \
                    --rate-policy=once-after:40

Note that I had no tracing session named my-session.

After a few minutes, Xorg froze. I managed to login again in a virtual console. I ran:

# pkill lttng

and got an instant kernel panic.

Attached is a photo of what was on the screen after running pkill.

Using:

  • LTTng-tools 60860e547ce31ea629e846e00b66342425474b8d.
  • LTTng-UST a0f2513af262a19822d46f84cd5e34be0badc484
  • LTTng-modules 51ef453614a6db2b778595b16d93283d25db974a
  • liburcu 5e1b7c840a2b21b8442b322cedbb70a790e49520

Files

IMG-5047.jpg (3.47 MB) IMG-5047.jpg Kernel panic photo Philippe Proulx, 05/07/2021 04:04 PM
Actions #1

Updated by Francis Deslauriers over 3 years ago

  • Assignee set to Francis Deslauriers
Actions #2

Updated by Philippe Proulx over 3 years ago

Here's something more. Might be related, but maybe not.

I start a session daemon as such:

# lttng-sessiond -v --group=eepp

Then I add a trigger which fires a lot:

$ lttng --group=eepp add-trigger --condition=event-rule-matches \
        --domain=kernel --type=syscall --action=notify

Now, when I kill lttng-sessiond, it doesn't unload any kernel module and exits with status 134.

It doesn't seem to behave like that without -v, and with triggers which don't fire a lot.

Sometimes, when I kill lttng-sessiond in these conditions, the system freezes.

Using LTTng-tools e80b715053eb21fe9139241be786afc2688c6795 and LTTng-modules 4667f7f663bcf3e8ec315fc7965893aeabd64e95 now.

Actions #3

Updated by Francis Deslauriers over 3 years ago

  • Status changed from New to Feedback

In what you described above, how much time passed between the add-trigger and the kill commands ?

Please try the latest commit of the LTTng-Tools master branch. The following commit is fixing a leak:
commit f568738c649cb005c7d838166c864addbcc27571
Author: Francis Deslauriers <>
Date: Fri May 7 17:44:52 2021 -0400

Fix: action-executor: leak of `work_item::subitems` field

This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.

Actions #4

Updated by Jérémie Galarneau over 3 years ago

Francis Deslauriers wrote in #note-3:

This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.

I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?

Actions #5

Updated by Francis Deslauriers over 3 years ago

Jérémie Galarneau wrote in #note-4:

I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?

Last Friday, it was discussed on IRC that this commit (that Phil didn't have when he reported the issue) might fix the source of kernel panic:

commit 6c8c025bf7552b6073c5c1884e1493badd842f42
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Thu May 6 11:48:42 2021 -0400

    Introduce struct lttng_kernel_tracepoint_class, enum probe_desc field

I agree that the leak doesn't explain the kernel panic. I assumed that Phil didn't witnessed a kernel panic that second time around but I might have assumed too much.

What I should have said "system freeze" instead of "system crash" in my last comment.

Actions #6

Updated by Mathieu Desnoyers over 3 years ago

For the sake of discussion, here are the leaks which were fixed by commit 6c8c025bf7552b6073c5c1884e1493badd842f42 (AFAIK):

lttng_kprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0

lttng_uprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0
- event_recorder->priv->parent.desc->fields

Actions

Also available in: Atom PDF