Project

General

Profile

Bug #1113

Ringbuffer looping when kernel pagefault event and userspace callstack context enabled

Added by Francis Deslauriers almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
05/25/2017
Due date:
% Done:

0%

Estimated time:

Description

I've been working on a bug occurring when testing the prototype implementation
of the callstack-user context. When enabling the
x86_exceptions_page_fault_kernel and adding the callstack-user [1], the trace would
be entirely filled with x86_exceptions_page_fault_kernel events containing the
exact same payload.

e.g.

[19:15:22.603450405] (+?.?????????) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603453134] (+0.000002729) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603454403] (+0.000001269) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603455434] (+0.000001031) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603456680] (+0.000001246) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603457691] (+0.000001011) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603458702] (+0.000001011) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603459698] (+0.000000996) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603460707] (+0.000001009) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603461716] (+0.000001009) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[...]

I found out that the when reserving space in the buffer, the ringbuffer gets
stuck in the following compare and exchange loop.

Here is the code in question (lib/ringbuffer/ring_buffer_frontend.c:2045):

    do {
           ret = lib_ring_buffer_try_reserve_slow(buf, chan, &offsets,
                               ctx);

               if (unlikely(ret)) {
                        return ret;
               }
    } while (unlikely(v_cmpxchg(config, &buf->offset, offsets.old,
                    offsets.end)
              != offsets.old));

In this loop, we try to reserve space in the buffer and if somebody else
changed the buffer writer position during that time we try again.

At the beginning of the execution of lib_ring_buffer_try_reserve_slow the
offsets->old local variable is set to the buffer's buf->offset->begin and the
cmpxchg checks that those values still match before atomicaly moving the
pointer in the v_cmpxchg() in the above code.

The act of trying to reserve space triggers the computation of the size of the
callstack context which triggers a kernel page fault. This page fault triggers
a tracepoint which moves the write pointer in the buffer. Which makes the
cmpxchg fail at the end of the loop and forces a retry.

Here is the ftrace trace of the kernel while it's looping: http://paste.ubuntu.com/24658695/

Looking at line 1060 that the LTTng tracer started the handling of a
sched_switch event. From there it calls lib_ring_buffer_reserve_slow() and
loops forever.

The problem is that everytime the tracer needs to reserve space in the ring
buffer for the callstack-user context it generates a new event that changes
the state of the buffer, which makes the tracer retry in the hope that the
buffer won't change this time.

As a possible solution, we could disable the kernel page fault event when the
tracer is nested into itself. But this would require significant changes in the
tracepoint() macro to add custom condition checking. In this case, we could
check the value of the lib_ring_buffer_nesting variable.

[1]: https://github.com/compudj/lttng-modules-dev/tree/callstack

#1

Updated by Francis Deslauriers almost 3 years ago

  • Assignee set to Francis Deslauriers
#2

Updated by Mathieu Desnoyers over 2 years ago

  • Status changed from New to Resolved

Fixed by commit:

commit cc62f29ec5d5149ae6dba1a9acb538e06da07a94
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Sat May 27 13:28:03 2017 +0200

    Calculate context length outside of retry loop

    Allow context length calculation to have side-effects (e.g. page faults)
    which trigger event tracing by moving the calculation outside of the
    buffer space reservation retry loop.

    This also paves the way to have dynamically sized contexts, which
    would expect to put their size of the internal stack. Note that the
    context length calculation is performed *after* the event payload field
    length calculation, so the stack needs to be used accordingly.

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Also available in: Atom PDF