Project

General

Profile

Actions

Bug #1408

closed

Kernel crash when loading lttng-tracing module with IBT enabled

Added by Mohamed Khalfella 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
12/21/2023
Due date:
% Done:

100%

Estimated time:

Description

[   34.579608,053][ T2901] kernel BUG at arch/x86/kernel/cet.c:102!
[   34.587325,053][ T2901] invalid opcode: 0000 [#1] PREEMPT SMP
[    34.595431,03][ T2901] CPU: 53 PID: 2901 Comm: lttng-sessiond Tainted: G           O       6.6.1+ #202312192336+7a8667e5f00a.kerneltesting.gcc-11
[   34.620728,053][ T2901] RIP: 0010:exc_control_protection+0xce/0xd0
[   34.620729,053][ T2901] Code: e9 87 04 00 00 4c 89 e6 48 89 ef e8 dc 1a 4e ff 44 89 ee 48 89 ef 5d 41 5c 41 5d e9 6c 04 00 00 48 c7 45 50 00 00 00 00 eb e6 <0f> 0b 66 0f 1f 00 41 56 49 89 f6 41 55 41 54 55 48 89 fd 0f 20 d0
[   34.620731,053][ T2901] RSP: 0018:ffffc9002021bc68 EFLAGS: 00010002
[       3 4 . 603][ T2901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fff7ffff
g [ 0 ;314; .3603][ T2901] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000fff7ffff
    34.620734,053][ T2901] RBP: ffffc9002021bc88 R08: ffffffff862644a8 R09: 00000000fff7ffff
[   34.620734,053][ T2901] R10: ffffffff826644c0 R11: ffffffff826644c0 R12: 0000000000000003
[   34.620735,053][ T2901] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   34.620736,053][ T2901] FS:  00007f4fb64cf580(0000) GS:ffff88ff7fb40000(0000) knlGS:0000000000000000
[   34.620737,053][ T2901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.620737,053][ T2901] CR2: 0000559ff41c7668 CR3: 0000004bda6d4003 CR4: 0000000000f70ee0
[   34.620738,053][ T2901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   34.620739,053][ T2901] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   34.620739,053][ T2901] PKRU: 55555554
[   34.620740,053][ T2901] MSR 198h IA32 perf status 0x000018d300001c00
[   34.620741,053][ T2901] MSR 19Ch IA32 thermal status 0x0000000088300800
[   34.620743,053][ T2901] MSR 1B1h IA32 package thermal status 0x00000000882c0800
[   34.620743,053][ T2901] Call Trace:
[   34.620744,053][ T2901]  <TASK>
[   34.620748,053][ T2901]  ? die+0x37/0x90
[   34.620751,053][ T2901]  ? do_trap+0xe0/0x110
[   34.620752,053][ T2901]  ? exc_control_protection+0xce/0xd0
[   34.620754,053][ T2901]  ? do_error_trap+0x70/0xb0
[   34.620755,053][ T2901]  ? exc_control_protection+0xce/0xd0
[   34.620757,053][ T2901]  ? exc_control_protection+0xce/0xd0
[   34.620759,053][ T2901]  ? exc_invalid_op+0x52/0x70
[   34.868413,053][ T2901]  ? exc_control_protection+0xce/0xd0
[   34.868416,053][ T2901]  ? asm_exc_invalid_op+0x1a/0x20
[   34.894089,053][ T2901]  ? exc_control_protection+0x5a/0xd0
[   34.902365,053][ T2901]  asm_exc_control_protection+0x26/0x30
[   34.902367,053][ T2901] RIP: 0010:kallsyms_lookup_name+0x4/0xc0
[   34.902369,053][ T2901] Code: 63 ff 48 63 04 bd c8 f1 29 82 85 c0 79 0a 48 f7 d0 48 03 05 a6 d8 1b 01 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 0f 1f 00 <0f> 1f 44 00 00 55 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44
[   34.948612,053][ T2901] RSP: 0018:ffffc9002021bd30 EFLAGS: 00010296
[   34.957567,053][ T2901] RAX: ffffffff8114bbe4 RBX: 0000000000000000 RCX: 0000000000000002
[   34.957568,053][ T2901] RDX: ffffc9002021bc80 RSI: 0000000000000000 RDI: ffffffffa1031194
[   34.957569,053][ T2901] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88c0877e5860
[   34.957571,053][ T2901] R13: 00007f4fb9090441 R14: ffff88c9da9b1bf8 R15: ffffffff87a282f8
[   35.023742,053][ T2901]  ? 0xffffffffa12a7000
[   35.054582,053][ T2901]  lttng_events_init+0x11/0x2a0 [lttng_tracer]
[   35.063999,053][ T2901]  ? 0xffffffffa12a7000
[   35.064002,053][ T2901]  do_one_initcall+0x45/0x210
    35.078967,053][ T2901]  ? kmalloc_trace+0x29/0x90
[   35.087025,053][ T2901]  do_init_module+0x64/0x240
[   35.094732,053][ T2901]  init_module_from_file+0x8b/0xd0
[   35.094734,053][ T2901]  idempotent_init_module+0x17d/0x230
[   35.094737,053][ T2901]  __x64_sys_finit_module+0x5e/0xb0
[   35.120981,053][ T2901]  do_syscall_64+0x39/0x80
[   35.128611,053][ T2901]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[   35.128613,053][ T2901] RIP: 0033:0x7f4fb8ca6a3d
[   35.174961,053][ T2901] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
[   35.231506,053][ T2901] R10: 000000000000001c R11: 0000000000000246 R12: 00007f4fb9090441
[   35.231507,053][ T2901] R13: 000055d7e121fc20 R14: 0000000000000000 R15: 000055d7e1237560
[   35.378236,053][ T2901] Dumping ftrace buffer:
[   35.394649,053][ T2901] ---[ end trace 0000000000000000 ]---
    35.911915,053][ T2901] RIP: 0010:exc_control_protection+0xce/0xd0
[   35.953043,053][ T2901] RSP: 0018:ffffc9002021bc68 EFLAGS: 00010002
[   35.953044,053][ T2901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fff7ffff
[   35.953047,053][ T2901] RBP: ffffc9002021bc88 R08: ffffffff862644a8 R09: 00000000fff7ffff
[   36.008442,053][ T2901] R10: ffffffff826644c0 R11: ffffffff826644c0 R12: 0000000000000003
[   36.008443,053][ T2901] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   36.008446,053][ T2901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.065471,053][ T2901] CR2: 0000559ff41c7668 CR3: 0000004bda6d4003 CR4: 0000000000f70ee0
[   36.065473,053][ T2901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.065473,053][ T2901] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   36.065474,053][ T2901] PKRU: 55555554
[   36.065475,053][ T2901] MSR 198h IA32 perf status 0x000018d300001c00
[   36.065477,053][ T2901] MSR 19Ch IA32 thermal status 0x0000000088300800
[   36.137537,053][ T2901] MSR 1B1h IA32 package thermal status 0x00000000882e0800
[   36.137539,053][ T2901] Kernel panic - not syncing: Fatal exception
[   36.145815,053][ T2901] was inactive. started with timeout 30s.
[   36.145822,053][ T2901] Dumping ftrace buffer:
[   36.145832,053][ T2901]    (ftrace buffer empty)
[   36.145856,053][ T2901] Kernel Offset: disabled

We saw this kernel crash when the kernel load lttng-trace module as part of module initialization on 6.6.1 Linux Kernel with compiled with CONFIG_X86_CET=y CONFIG_X86_KERNEL_IBT=y. The machine is running a cpu that supports IBT and the feature is enabled.
lttng_events_init() ->
  wrapper_get_pfnblock_flags_mask_init() ->
    kallsyms_lookup_funcptr() ->
      wrapper_kallsyms_lookup_name() ->
        kallsyms_lookup_name_sym()

This is the codepath of the crash. I suspect that because the kallsyms_lookup_name address was obtained by means of kprobe the function call was an indirect function call to an instruction that is not endbr64 and this resulted in the kernel crash.
(gdb) x/5i 0xffffffff8114bbe0
   0xffffffff8114bbe0 <kallsyms_lookup_name>:    nopw   (%rax)
   0xffffffff8114bbe4 <kallsyms_lookup_name+4>:    nopl   0x0(%rax,%rax,1)
   0xffffffff8114bbe9 <kallsyms_lookup_name+9>:    push   %rbp
   0xffffffff8114bbea <kallsyms_lookup_name+10>:    sub    $0x10,%rsp
   0xffffffff8114bbee <kallsyms_lookup_name+14>:    mov    %gs:0x28,%rax
(gdb) x/20bx 0xffffffff8114bbe0
0xffffffff8114bbe0 <kallsyms_lookup_name>:    0x66    0x0f    0x1f    0x00    0x0f    0x1f    0x44    0x00
0xffffffff8114bbe8 <kallsyms_lookup_name+8>:    0x00    0x55    0x48    0x83    0xec    0x10    0x65    0x48
0xffffffff8114bbf0 <kallsyms_lookup_name+16>:    0x8b    0x04    0x25    0x28
(gdb) 

The instruction at 0xffffffff8114bbe4 is not endbr64 as one can see. Compiling the kernel with IBT disabled made the problem go away.

Actions

Also available in: Atom PDF