Bug #1313
closedlttng-modules warnings and test hang in tools/clear/test_kernel
100%
Description
I observe the following behavior once in a while when running the kernel "clear" tests repeatedly:
1) warning in get_subbuf ioctl, which seems to point to lack of stream user-space locking in the consumer daemon:
That was with lttng-tools commit:
commit 60860e547ce31ea629e846e00b66342425474b8d
Author: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Fri Apr 23 21:28:58 2021 -0400
Update version to v2.13.0-rc1
and lttng-modules commit:
commit 2dc781e02eb156a76554ada092a181ab2916db57
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Wed Apr 28 16:26:20 2021 -0400
Refactoring: context callbacks
517716.168856] ------------[ cut here ]------------ [517716.171559] WARNING: CPU: 1 PID: 19313 at /home/efficios/git/lttng-modules/src/lib/ringbuffer/ring_buffer_frontend.c:1263 lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer] [517716.180096] Modules linked in: lttng_test(O) lttng_probe_x86_exceptions(O) lttng_probe_x86_irq_vectors(O) lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_btrfs(O) lttng_probe_block(O) lttng_counter_client_percpu_32_modular(O) lttng_counter_client_percpu_64_modular(O) lttng_counter(O) lttng_ring_buffer_event_notifier_client(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) [517716.180815] lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_wrapper(O) lttng_uprobes(O) lttng_clock(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) [last unloaded: lttng_wrapper] [517716.213228] CPU: 1 PID: 19313 Comm: lttng-consumerd Tainted: G O 5.11.2 #80 [517716.215573] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [517716.220341] RIP: 0010:lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer] [517716.222579] Code: 50 f0 ff 00 0f 0b 49 03 5f 28 44 8b 85 80 00 00 00 49 8b 77 30 45 85 c0 48 89 d9 0f 85 4f ff ff ff e9 25 ff ff ff f0 ff 45 00 <0f> 0b b8 f0 ff ff ff e9 a6 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00 [517716.227610] RSP: 0018:ffffbafd09023e88 EFLAGS: 00010202 [517716.229147] RAX: 0000000000000000 RBX: ffff9906ed069a00 RCX: ffff9905c4e4a400 [517716.231186] RDX: ffffdafcffc4ec90 RSI: 0000000000200000 RDI: ffffdafcffc4e9f0 [517716.233265] RBP: ffff9905c4e4a400 R08: 0000000000300000 R09: 0000000000200000 [517716.235284] R10: 0000000000200000 R11: 0000000000000000 R12: 0000000000000000 [517716.237333] R13: 0000000000000000 R14: 000000000000005e R15: 0000000000000000 [517716.239360] FS: 00007ff327fff700(0000) GS:ffff9905a7a40000(0000) knlGS:0000000000000000 [517716.241634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [517716.243262] CR2: 000055b91cf07730 CR3: 000000076f45e003 CR4: 00000000001706e0 [517716.245316] Call Trace: [517716.246281] lib_ring_buffer_ioctl+0x181/0x300 [lttng_lib_ring_buffer] [517716.248301] lttng_stream_ring_buffer_ioctl+0x1a3/0x200 [lttng_tracer] [517716.252621] __x64_sys_ioctl+0x8e/0xd0 [517716.253931] do_syscall_64+0x33/0x80 [517716.255016] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [517716.256448] RIP: 0033:0x7ff3372f46d7 [517716.257586] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48 [517716.262472] RSP: 002b:00007ff327ffe2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [517716.264634] RAX: ffffffffffffffda RBX: 00007ff310002740 RCX: 00007ff3372f46d7 [517716.266674] RDX: 0000000000000000 RSI: 000000000000f605 RDI: 000000000000005e [517716.268713] RBP: 00007ff327ffe310 R08: 00007ff310002870 R09: a002000000000000 [517716.270732] R10: 000055992b0a6530 R11: 0000000000000246 R12: 000055992c47da70 [517716.272768] R13: 00007ff318005e80 R14: 00007ff310002740 R15: 000055992b0a6528 [517716.274800] irq event stamp: 4526705 [517716.275897] hardirqs last enabled at (4526713): [<ffffffff9016a474>] console_unlock+0x4b4/0x5b0 [517716.278320] hardirqs last disabled at (4526722): [<ffffffff9016a3d0>] console_unlock+0x410/0x5b0 [517716.280759] softirqs last enabled at (4526658): [<ffffffff9120030f>] __do_softirq+0x30f/0x432 [517716.285125] softirqs last disabled at (4526653): [<ffffffff91001052>] asm_call_irq_on_stack+0x12/0x20 [517716.287648] ---[ end trace 506e55b312b731bf ]---
Alongside with that warning, there were 2 tests that failed with a make check as root:
FAIL: tools/clear/test_kernel 727 - Read a total of 1 events, expected 6
FAIL: tools/clear/test_kernel 728 - Destroy session bVV9ffZJ1CoPEgd1
When re-running tools/clear/test_kernel in a loop, I notice that the test hangs once in a while at:
- Test kernel streaming live clear
- Parameters: tracing_active=0, clear_twice=0
ok 350 - Create session uJke1Ag0zql0MAtC with uri:net://localhost and opts: --live
ok 351 - Enable kernel event lttng_test_filter_event for session uJke1Ag0zql0MAtC
ok 352 - Start tracing for session uJke1Ag0zql0MAtC
ok 353 - Stop lttng tracing for session uJke1Ag0zql0MAtC
ok 354 - Clear session uJke1Ag0zql0MAtC
Once it happened on the first run, then the next it took 36 runs to hang.
Updated by Jérémie Galarneau almost 3 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset lttng-tools|741e787bf03338ec74a1df3621c5bde4b13c4e17.