Project

General

Profile

Actions

Bug #1313

closed

lttng-modules warnings and test hang in tools/clear/test_kernel

Added by Mathieu Desnoyers almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
04/29/2021
Due date:
% Done:

100%

Estimated time:

Description

I observe the following behavior once in a while when running the kernel "clear" tests repeatedly:

1) warning in get_subbuf ioctl, which seems to point to lack of stream user-space locking in the consumer daemon:

That was with lttng-tools commit:

commit 60860e547ce31ea629e846e00b66342425474b8d
Author: Jérémie Galarneau <>
Date: Fri Apr 23 21:28:58 2021 -0400

Update version to v2.13.0-rc1

and lttng-modules commit:

commit 2dc781e02eb156a76554ada092a181ab2916db57
Author: Mathieu Desnoyers <>
Date: Wed Apr 28 16:26:20 2021 -0400

Refactoring: context callbacks
517716.168856] ------------[ cut here ]------------
[517716.171559] WARNING: CPU: 1 PID: 19313 at /home/efficios/git/lttng-modules/src/lib/ringbuffer/ring_buffer_frontend.c:1263 lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer]
[517716.180096] Modules linked in: lttng_test(O) lttng_probe_x86_exceptions(O) lttng_probe_x86_irq_vectors(O) lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_btrfs(O) lttng_probe_block(O) lttng_counter_client_percpu_32_modular(O) lttng_counter_client_percpu_64_modular(O) lttng_counter(O) lttng_ring_buffer_event_notifier_client(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O)
[517716.180815]  lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_wrapper(O) lttng_uprobes(O) lttng_clock(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) [last unloaded: lttng_wrapper]
[517716.213228] CPU: 1 PID: 19313 Comm: lttng-consumerd Tainted: G           O      5.11.2 #80
[517716.215573] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[517716.220341] RIP: 0010:lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer]
[517716.222579] Code: 50 f0 ff 00 0f 0b 49 03 5f 28 44 8b 85 80 00 00 00 49 8b 77 30 45 85 c0 48 89 d9 0f 85 4f ff ff ff e9 25 ff ff ff f0 ff 45 00 <0f> 0b b8 f0 ff ff ff e9 a6 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00
[517716.227610] RSP: 0018:ffffbafd09023e88 EFLAGS: 00010202
[517716.229147] RAX: 0000000000000000 RBX: ffff9906ed069a00 RCX: ffff9905c4e4a400
[517716.231186] RDX: ffffdafcffc4ec90 RSI: 0000000000200000 RDI: ffffdafcffc4e9f0
[517716.233265] RBP: ffff9905c4e4a400 R08: 0000000000300000 R09: 0000000000200000
[517716.235284] R10: 0000000000200000 R11: 0000000000000000 R12: 0000000000000000
[517716.237333] R13: 0000000000000000 R14: 000000000000005e R15: 0000000000000000
[517716.239360] FS:  00007ff327fff700(0000) GS:ffff9905a7a40000(0000) knlGS:0000000000000000
[517716.241634] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[517716.243262] CR2: 000055b91cf07730 CR3: 000000076f45e003 CR4: 00000000001706e0
[517716.245316] Call Trace:
[517716.246281]  lib_ring_buffer_ioctl+0x181/0x300 [lttng_lib_ring_buffer]
[517716.248301]  lttng_stream_ring_buffer_ioctl+0x1a3/0x200 [lttng_tracer]
[517716.252621]  __x64_sys_ioctl+0x8e/0xd0
[517716.253931]  do_syscall_64+0x33/0x80
[517716.255016]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[517716.256448] RIP: 0033:0x7ff3372f46d7
[517716.257586] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
[517716.262472] RSP: 002b:00007ff327ffe2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[517716.264634] RAX: ffffffffffffffda RBX: 00007ff310002740 RCX: 00007ff3372f46d7
[517716.266674] RDX: 0000000000000000 RSI: 000000000000f605 RDI: 000000000000005e
[517716.268713] RBP: 00007ff327ffe310 R08: 00007ff310002870 R09: a002000000000000
[517716.270732] R10: 000055992b0a6530 R11: 0000000000000246 R12: 000055992c47da70
[517716.272768] R13: 00007ff318005e80 R14: 00007ff310002740 R15: 000055992b0a6528
[517716.274800] irq event stamp: 4526705
[517716.275897] hardirqs last  enabled at (4526713): [<ffffffff9016a474>] console_unlock+0x4b4/0x5b0
[517716.278320] hardirqs last disabled at (4526722): [<ffffffff9016a3d0>] console_unlock+0x410/0x5b0
[517716.280759] softirqs last  enabled at (4526658): [<ffffffff9120030f>] __do_softirq+0x30f/0x432
[517716.285125] softirqs last disabled at (4526653): [<ffffffff91001052>] asm_call_irq_on_stack+0x12/0x20
[517716.287648] ---[ end trace 506e55b312b731bf ]---

Alongside with that warning, there were 2 tests that failed with a make check as root:

FAIL: tools/clear/test_kernel 727 - Read a total of 1 events, expected 6
FAIL: tools/clear/test_kernel 728 - Destroy session bVV9ffZJ1CoPEgd1

When re-running tools/clear/test_kernel in a loop, I notice that the test hangs once in a while at:

  1. Test kernel streaming live clear
  2. Parameters: tracing_active=0, clear_twice=0
    ok 350 - Create session uJke1Ag0zql0MAtC with uri:net://localhost and opts: --live
    ok 351 - Enable kernel event lttng_test_filter_event for session uJke1Ag0zql0MAtC
    ok 352 - Start tracing for session uJke1Ag0zql0MAtC
    ok 353 - Stop lttng tracing for session uJke1Ag0zql0MAtC
    ok 354 - Clear session uJke1Ag0zql0MAtC

Once it happened on the first run, then the next it took 36 runs to hang.

Actions #1

Updated by Jérémie Galarneau almost 3 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF