Bug #1315
openKernel panics after `pkill lttng`; root session daemon has active triggers
0%
Description
I don't know how to reproduce this bug.
I played with the lttng add-trigger
command while writing usage examples for its manual page.
I started the root session daemon as such:
# lttng-sessiond --daemonize --group=eepp
I then ran commands such as:
$ lttng add-trigger --name user --condition=event-rule-matches --domain=user --action=notify
$ lttng add-trigger --condition=event-rule-matches \ --domain=user --action=notify \ --rate-policy=every:10
$ lttng add-trigger --owner-uid=33 \ --condition=event-rule-matches \ --domain=kernel --name='sched*' \ --action=notify
$ lttng add-trigger --condition=event-rule-matches \ --domain=kernel --type=syscall \ --filter='fd < 3' \ --action=start-session my-session \ --rate-policy=once-after:40
Note that I had no tracing session named my-session
.
After a few minutes, Xorg froze. I managed to login again in a virtual console. I ran:
# pkill lttng
and got an instant kernel panic.
Attached is a photo of what was on the screen after running pkill
.
Using:
- LTTng-tools
60860e547ce31ea629e846e00b66342425474b8d
. - LTTng-UST
a0f2513af262a19822d46f84cd5e34be0badc484
- LTTng-modules
51ef453614a6db2b778595b16d93283d25db974a
- liburcu
5e1b7c840a2b21b8442b322cedbb70a790e49520
Files
Updated by Francis Deslauriers over 3 years ago
- Assignee set to Francis Deslauriers
Updated by Philippe Proulx over 3 years ago
Here's something more. Might be related, but maybe not.
I start a session daemon as such:
# lttng-sessiond -v --group=eepp
Then I add a trigger which fires a lot:
$ lttng --group=eepp add-trigger --condition=event-rule-matches \ --domain=kernel --type=syscall --action=notify
Now, when I kill lttng-sessiond
, it doesn't unload any kernel module and exits with status 134.
It doesn't seem to behave like that without -v
, and with triggers which don't fire a lot.
Sometimes, when I kill lttng-sessiond
in these conditions, the system freezes.
Using LTTng-tools e80b715053eb21fe9139241be786afc2688c6795
and LTTng-modules 4667f7f663bcf3e8ec315fc7965893aeabd64e95
now.
Updated by Francis Deslauriers over 3 years ago
- Status changed from New to Feedback
In what you described above, how much time passed between the add-trigger
and the kill
commands ?
Please try the latest commit of the LTTng-Tools master branch. The following commit is fixing a leak:
commit f568738c649cb005c7d838166c864addbcc27571
Author: Francis Deslauriers <francis.deslauriers@efficios.com>
Date: Fri May 7 17:44:52 2021 -0400
Fix: action-executor: leak of `work_item::subitems` field
This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.
Updated by Jérémie Galarneau over 3 years ago
Francis Deslauriers wrote in #note-3:
This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.
I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?
Updated by Francis Deslauriers over 3 years ago
Jérémie Galarneau wrote in #note-4:
I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?
Last Friday, it was discussed on IRC that this commit (that Phil didn't have when he reported the issue) might fix the source of kernel panic:
commit 6c8c025bf7552b6073c5c1884e1493badd842f42 Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Thu May 6 11:48:42 2021 -0400 Introduce struct lttng_kernel_tracepoint_class, enum probe_desc field
I agree that the leak doesn't explain the kernel panic. I assumed that Phil didn't witnessed a kernel panic that second time around but I might have assumed too much.
What I should have said "system freeze" instead of "system crash" in my last comment.
Updated by Mathieu Desnoyers over 3 years ago
For the sake of discussion, here are the leaks which were fixed by commit 6c8c025bf7552b6073c5c1884e1493badd842f42 (AFAIK):
lttng_kprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0
lttng_uprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0
- event_recorder->priv->parent.desc->fields