Project

General

Profile

Bug #1167

Linux Kernel Oops when creating LTTng kernel channel while using huge pages

Added by Aleix Roca Nonell over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
05/31/2018
Due date:
% Done:

0%

Estimated time:

Description

I have stumbled with a Linux Kernel Oops while using LTTng kernel tracer. The bug pops up every time I try to trace a simple application that makes use of 1G hugepages with libhugepagetlbfs. Please note that if the LTTng channel buffer size is not big enough (I have not determined exactly what enough is) the Oops does not appears.

To bug should be reproducible with the following commands:

lttng enable-channel --kernel --num-subbuf=16 --subbuf-size=128M big-krn-channel
lttng enable-event --kernel --channel=big-krn-channel --all
lttng start
hugectl --heap=1G sleep 10
lttng stop
lttng destroy

I'm using the latest lttng 2.10 ust(2.10.1)/tools(2.10.4)/modules(2.10.6) from git.

LTTng-ust: 979c384814c9fad78c1e297a129da86dda2e4573
LTTng-modules: 4c517fb46949a6dfcaf73fa2131b730432ac640b
LTTng-tools: a0b75f6cfee839d0f85fab6bb2c0d8e80db5823d

The system has two sockets with a total of 56 CPUs and 128GiB of main memory.
ThI'm using a linux kernel 4.15.9 (and I have seen the same problem on a linux 4.15.18).

The Oops message follows:

xeon08 login: [ 244.741758] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 244.750940] IP: _cpuhp_state_remove_instance+0xc1/0x1c0
[ 244.757246] PGD 80000020390b2067 P4D 80000020390b2067 PUD 2036ad9067 PMD 0
[ 244.765445] Oops: 0002 [#1] SMP PTI
[ 244.769713] Modules linked in: msr(E) lttng_probe_x86_exceptions(OE) lttng_probe_x86_irq_vectors(OE) lttng_probe_writeback(OE) lttng_probe_workqueue(OE) lttng_probe_vmscan(OE) lttng_probe_udp(OE) lttng_probe_timer(OE) lttng_probe_sunrpc(OE) lttng_probe_statedump(OE) lttng_probe_sock(OE) lttng_probe_skb(OE) lttng_probe_signal(OE) lttng_probe_scsi(OE) lttng_probe_sched(OE) lttng_probe_regmap(OE) lttng_probe_rcu(OE) lttng_probe_random(OE) lttng_probe_printk(OE) lttng_probe_power(OE) lttng_probe_net(OE) lttng_probe_napi(OE) lttng_probe_module(OE) lttng_probe_kvm_x86_mmu(OE) lttng_probe_kvm_x86(OE) kvm(E) irqbypass(E) lttng_probe_kvm(OE) lttng_probe_kmem(OE) lttng_probe_jbd2(OE) lttng_probe_irq(OE) lttng_probe_i2c(OE) lttng_probe_gpio(OE) lttng_probe_ext4(OE) lttng_probe_compaction(OE) lttng_probe_block(OE)
[ 244.852692] lttng_ring_buffer_metadata_mmap_client(OE) lttng_ring_buffer_client_mmap_overwrite(OE) lttng_ring_buffer_client_mmap_discard(OE) lttng_ring_buffer_metadata_client(OE) lttng_ring_buffer_client_overwrite(OE) lttng_ring_buffer_client_discard(OE) lttng_tracer(OE) lttng_statedump(OE) lttng_ftrace(OE) lttng_kprobes(OE) lttng_clock(OE) lttng_lib_ring_buffer(OE) lttng_kretprobes(OE) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) mgag200(E) hfi1(E) coretemp(E) drm_kms_helper(E) crct10dif_pclmul(E) syscopyarea(E) crc32_pclmul(E) sysfillrect(E) crc32c_intel(E) sysimgblt(E) fb_sys_fops(E) ttm(E) ixgbe(E) aesni_intel(E) drm(E) rdmavt(E) aes_x86_64(E) mdio(E) iTCO_wdt(E)
[ 244.852724] ipmi_si(E) joydev(E) ptp(E) crypto_simd(E) iTCO_vendor_support(E) i2c_algo_bit(E) ipmi_devintf(E) cryptd(E) pps_core(E) glue_helper(E) mei_me(E) ipmi_msghandler(E) ioatdma(E) pcspkr(E) lpc_ich(E) mei(E) mfd_core(E) i2c_i801(E) dca(E) shpchp(E) wmi(E) acpi_pad(E) ib_ipoib(E) acpi_cpufreq(E) rdma_ucm(E) button(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) ib_core(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) xhci_pci(E) xhci_hcd(E) ehci_pci(E) ehci_hcd(E) ahci(E) libahci(E) libata(E) usbcore(E) sg(E) scsi_mod(E) autofs4(E)
[ 244.852757] CPU: 36 PID: 2401 Comm: lttng-sessiond Tainted: G OE 4.15.9-stable #4
[ 244.852758] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[ 244.852760] RIP: 0010:
_cpuhp_state_remove_instance+0xc1/0x1c0
[ 244.852761] RSP: 0018:ffffc9000996fa98 EFLAGS: 00010246
[ 244.852762] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 0000000000000000
[ 244.852763] RDX: 0000000000000000 RSI: 0000000000000038 RDI: ffffffff822fbea0
[ 244.852763] RBP: 0000000000000041 R08: 0000000000000000 R09: 0000000000000000
[ 244.852764] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88202d29bc68
[ 244.852764] R13: 0000000000016aa0 R14: 0000000000000a28 R15: 0000000000000000
[ 244.852765] FS: 00007fc00effd700(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000
[ 244.852766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 244.852767] CR2: 0000000000000000 CR3: 0000002038570002 CR4: 00000000003606e0
[ 244.852768] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 244.852768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 244.852769] Call Trace:
[ 244.852776] ? channel_backend_init+0x1dc/0x2f0 [lttng_lib_ring_buffer]
[ 244.852780] channel_backend_init+0x1dc/0x2f0 [lttng_lib_ring_buffer]
[ 244.852783] channel_create+0x6d/0x1b0 [lttng_lib_ring_buffer]
[ 244.852788] _channel_create+0x34/0x80 [lttng_ring_buffer_client_discard]
[ 244.852831] lttng_channel_create+0xff/0x1c0 [lttng_tracer]
[ 244.852843] lttng_abi_create_channel+0x119/0x220 [lttng_tracer]
[ 244.852855] lttng_session_ioctl+0x19e/0x2e0 [lttng_tracer]
[ 244.852859] ? hrtimer_try_to_cancel+0xc4/0x110
[ 244.852862] ? hrtimer_try_to_cancel+0xc4/0x110
[ 244.852864] ? hrtimer_cancel+0x15/0x20
[ 244.852867] ? futex_wait+0x177/0x220
[ 244.852872] ? sched_clock+0x5/0x10
[ 244.852875] ? sched_clock_cpu+0xc/0xa0
[ 244.852877] ? __lock_acquire.isra.31+0x165/0x700
[ 244.852886] do_vfs_ioctl+0x8f/0x5e0
[ 244.852890] ? __fget+0xb4/0xf0
[ 244.852891] ? __fget+0x5/0xf0
[ 244.852893] SyS_ioctl+0x74/0x80
[ 244.852898] do_syscall_64+0x6e/0x1a0
[ 244.852901] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 244.852902] RIP: 0033:0x7fc0154254b7
[ 244.852903] RSP: 002b:00007fc00efec5b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 244.852904] RAX: ffffffffffffffda RBX: 00007fc000005410 RCX: 00007fc0154254b7
[ 244.852905] RDX: 00007fc00efec5c0 RSI: 000000004140f655 RDI: 0000000000000030
[ 244.852905] RBP: 00007fc000005470 R08: 0000000000000000 R09: 0000000000000030
[ 244.852906] R10: fffffffffffffe80 R11: 0000000000000246 R12: 00000000ffffffff
[ 244.852907] R13: 00007fc000001b70 R14: 0000000000000050 R15: 00007fc000001b70
[ 244.852910] Code: 15 25 b8 fd 00 85 d2 0f 85 f9 00 00 00 31 f6 48 c7 c7 00 b8 05 82 e8 8f c0 63 00 84 db 75 75 49 8b 04 24 49 8b 54 24 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 c7
[ 244.852934] RIP: __cpuhp_state_remove_instance+0xc1/0x1c0 RSP: ffffc9000996fa98
[ 244.852935] CR2: 0000000000000000
[ 244.852937] ---[ end trace a8be70506d9817e6 ]---

Also available in: Atom PDF