Bug #1315
openKernel panics after `pkill lttng`; root session daemon has active triggers
0%
Description
I don't know how to reproduce this bug.
I played with the lttng add-trigger command while writing usage examples for its manual page.
I started the root session daemon as such:
# lttng-sessiond --daemonize --group=eepp
I then ran commands such as:
$ lttng add-trigger --name user --condition=event-rule-matches --domain=user --action=notify
$ lttng add-trigger --condition=event-rule-matches \
--domain=user --action=notify \
--rate-policy=every:10
$ lttng add-trigger --owner-uid=33 \
--condition=event-rule-matches \
--domain=kernel --name='sched*' \
--action=notify
$ lttng add-trigger --condition=event-rule-matches \
--domain=kernel --type=syscall \
--filter='fd < 3' \
--action=start-session my-session \
--rate-policy=once-after:40
Note that I had no tracing session named my-session.
After a few minutes, Xorg froze. I managed to login again in a virtual console. I ran:
# pkill lttng
and got an instant kernel panic.
Attached is a photo of what was on the screen after running pkill.
Using:
- LTTng-tools
60860e547ce31ea629e846e00b66342425474b8d. - LTTng-UST
a0f2513af262a19822d46f84cd5e34be0badc484 - LTTng-modules
51ef453614a6db2b778595b16d93283d25db974a - liburcu
5e1b7c840a2b21b8442b322cedbb70a790e49520
Files
FD Updated by Francis Deslauriers over 4 years ago
- Assignee set to Francis Deslauriers
PP Updated by Philippe Proulx over 4 years ago
Here's something more. Might be related, but maybe not.
I start a session daemon as such:
# lttng-sessiond -v --group=eepp
Then I add a trigger which fires a lot:
$ lttng --group=eepp add-trigger --condition=event-rule-matches \
--domain=kernel --type=syscall --action=notify
Now, when I kill lttng-sessiond, it doesn't unload any kernel module and exits with status 134.
It doesn't seem to behave like that without -v, and with triggers which don't fire a lot.
Sometimes, when I kill lttng-sessiond in these conditions, the system freezes.
Using LTTng-tools e80b715053eb21fe9139241be786afc2688c6795 and LTTng-modules 4667f7f663bcf3e8ec315fc7965893aeabd64e95 now.
FD Updated by Francis Deslauriers over 4 years ago
- Status changed from New to Feedback
In what you described above, how much time passed between the add-trigger and the kill commands ?
Please try the latest commit of the LTTng-Tools master branch. The following commit is fixing a leak:
commit f568738c649cb005c7d838166c864addbcc27571
Author: Francis Deslauriers <francis.deslauriers@efficios.com>
Date: Fri May 7 17:44:52 2021 -0400
Fix: action-executor: leak of `work_item::subitems` field
This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.
JG Updated by Jérémie Galarneau over 4 years ago
Francis Deslauriers wrote in #note-3:
This leak can explain all 3 issues you see: the system crash, the unloading of modules, and the sessiond exit status.
I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?
FD Updated by Francis Deslauriers over 4 years ago
Jérémie Galarneau wrote in #note-4:
I'm not sure I see how a user space leak can lead to a kernel panic. Am I missing something?
Last Friday, it was discussed on IRC that this commit (that Phil didn't have when he reported the issue) might fix the source of kernel panic:
commit 6c8c025bf7552b6073c5c1884e1493badd842f42
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Thu May 6 11:48:42 2021 -0400
Introduce struct lttng_kernel_tracepoint_class, enum probe_desc field
I agree that the leak doesn't explain the kernel panic. I assumed that Phil didn't witnessed a kernel panic that second time around but I might have assumed too much.
What I should have said "system freeze" instead of "system crash" in my last comment.
MD Updated by Mathieu Desnoyers over 4 years ago
For the sake of discussion, here are the leaks which were fixed by commit 6c8c025bf7552b6073c5c1884e1493badd842f42 (AFAIK):
lttng_kprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0
lttng_uprobes_register_event() label register_error leaked:
- event_recorder->priv->parent.desc->fields0
- event_recorder->priv->parent.desc->fields