LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912021-11-02T13:30:01ZLTTng bugs repository
Redmine LTTng-tools - Bug #1330 (Resolved): <sys/unistd.h> Still used instead of <unistd.h> in kernel-pro...https://bugs.lttng.org/issues/13302021-11-02T13:30:01ZDuncan Bellamy
<p>This stops compilation on alpine Linux without patching src/common/kernel-probe.c and src/common/userspace-probe.c to use<br /><pre><code class="c syntaxhl" data-language="c"><span class="cp">#include</span> <span class="cpf"><unistd.h></span><span class="cp">
</span></code></pre></p>
<p>configure checks for unistd.h which passes and other files already use just unistd.h</p> LTTng-tools - Bug #1316 (Resolved): Adding a logging "event rule matches" trigger makes the sessi...https://bugs.lttng.org/issues/13162021-05-11T03:32:12ZPhilippe Proulxeeppeliteloop@gmail.com
<p>To reproduce:</p>
<pre>
# lttng-sessiond -v
</pre>
<pre>
$ lttng add-trigger --condition=event-rule-matches --domain=python --action=notify
</pre>
<p><code>lttng-sessiond</code> exits with status 134 (aborts) while the CLI prints</p>
<pre>
Error: Failed to register trigger: No session daemon is available.
</pre>
<p>Same with <code>--domain=jul</code> and <code>--domain=log4j</code>.</p>
<p>Using LTTng-tools <code>e80b715053eb21fe9139241be786afc2688c6795</code>.</p> LTTng - Bug #1315 (Feedback): Kernel panics after `pkill lttng`; root session daemon has active t...https://bugs.lttng.org/issues/13152021-05-07T20:09:26ZPhilippe Proulxeeppeliteloop@gmail.com
<p>I don't know how to reproduce this bug.</p>
<p>I played with the <code>lttng add-trigger</code> command while writing usage examples for its manual page.</p>
<p>I started the root session daemon as such:</p>
<pre>
# lttng-sessiond --daemonize --group=eepp
</pre>
<p>I then ran commands such as:</p>
<pre>
$ lttng add-trigger --name user --condition=event-rule-matches --domain=user --action=notify
</pre>
<pre>
$ lttng add-trigger --condition=event-rule-matches \
--domain=user --action=notify \
--rate-policy=every:10
</pre>
<pre>
$ lttng add-trigger --owner-uid=33 \
--condition=event-rule-matches \
--domain=kernel --name='sched*' \
--action=notify
</pre>
<pre>
$ lttng add-trigger --condition=event-rule-matches \
--domain=kernel --type=syscall \
--filter='fd < 3' \
--action=start-session my-session \
--rate-policy=once-after:40
</pre>
<p>Note that I had no tracing session named <code>my-session</code>.</p>
<p>After a few minutes, Xorg froze. I managed to login again in a virtual console. I ran:</p>
<pre>
# pkill lttng
</pre>
<p>and got an instant kernel panic.</p>
<p>Attached is a photo of what was on the screen after running <code>pkill</code>.</p>
<p>Using:</p>
<ul>
<li>LTTng-tools <code>60860e547ce31ea629e846e00b66342425474b8d</code>.</li>
<li>LTTng-UST <code>a0f2513af262a19822d46f84cd5e34be0badc484</code></li>
<li>LTTng-modules <code>51ef453614a6db2b778595b16d93283d25db974a</code></li>
<li>liburcu <code>5e1b7c840a2b21b8442b322cedbb70a790e49520</code></li>
</ul> Babeltrace - Bug #1240 (Resolved): Can't write plugins solely with `--enable-python-plugins`https://bugs.lttng.org/issues/12402020-02-21T20:34:20ZFrancis Deslauriersfrancis.deslauriers@efficios.com
<p>If the user builds and install the project:<br /><pre>
./configure --enable-python-plugins
make
make install
</pre></p>
<p>They won't be able to do the <code>import bt2</code> necessary to start defining their BT2 plugin.<br />To write a Python <strong>plugin</strong> , the user needs to use the Python <strong>bindings</strong> as well.</p>
<p>The user gets this:<br /><pre>
>>> import bt2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'bt2'
</pre></p>
We could either:
<ol>
<li>turn on `--enable-python-bindings` whenever `--enable-python-plugins` is enabled,</li>
<li>error out during the ./configure if `--enable-python-bindings` is not provided.</li>
</ol> LTTng-tools - Bug #1217 (Resolved): regression/tools/live intermitent failure on yocto with 2.11https://bugs.lttng.org/issues/12172020-01-24T16:40:44ZJonathan Rajotte Julienjonathan.rajotte-julien@efficios.com
<p>While upgrading to 2.11 Alexander Kanavin observe intermittent failure for the live test.</p>
<pre>
With those issues addressed (patch is coming), I am still having two
failures:
# Failed test
(../../../../../lttng-tools-2.11.0/tests/regression/tools/live/live_test.c:main()
at line 707)
# Got first packet index with offset 0 and len 4096
not ok 6 - Get metadata, received 0 bytes
FAIL: tools/live/test_ust_tracefile_count 6 - Get metadata, received 0 bytes
# Got first packet index with offset 0 and len 4096
not ok 6 - Get metadata, received 0 bytes
FAIL: tools/live/test_ust 6 - Get metadata, received 0 bytes
What's weird is that sometimes they pass. Could there be a race or some
timing issue in the test?
</pre>
<p>Step to reproduce:<br /><pre>
git clone https://github.com/PSRCode/poky-contrib/tree/live-test-failure-poky
then setup a default build (with qemu x86_64 as target), 'bitbake
core-image-sato-ptest', then 'runqemu kvm nographic'.
Then log in as root (no password), change to /usr/lib/lttng-tools/ptest,
and issue ./run-ptest. Then you should be able to see the failure.
To exit qemu: ctrl-a x.
</pre></p> LTTng-tools - Feature #1180 (New): SDT tracing does not work when the probes are compiled with se...https://bugs.lttng.org/issues/11802019-03-27T15:41:21ZNaser Ez
<p>Tracing SDT probes does not work when the probes are compiled with semaphores.</p>
<p>For example, when we have a probe like:</p>
<pre>
Provider: nginx
Name: http__subrequest__start
Location: 0x0000000000429e9c, Base: 0x0000000000473810, Semaphore: 0x00000000006920ba
Arguments: 8@%rbx__
</pre>
<p>then running the following command:<br /><pre>
lttng enable-event -k nginx:http__subrequest__start --userspace-probe=sdt:/usr/local/nginx/sbin/nginx:nginx:http__subrequest__start
</pre></p>
<p>will generate this error:<br /><pre>
Error: Event nginx:http__subrequest__start: Invalid userspace probe location (channel channel0, session auto-20190327-105118)
</pre></p> LTTng-modules - Bug #1113 (Resolved): Ringbuffer looping when kernel pagefault event and userspac...https://bugs.lttng.org/issues/11132017-05-26T00:51:57ZFrancis Deslauriersfrancis.deslauriers@efficios.com
<p>I've been working on a bug occurring when testing the prototype implementation<br />of the callstack-user context. When enabling the<br />x86_exceptions_page_fault_kernel and adding the callstack-user [1], the trace would<br />be entirely filled with x86_exceptions_page_fault_kernel events containing the<br />exact same payload.</p>
<p>e.g.<br /><pre>
[19:15:22.603450405] (+?.?????????) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603453134] (+0.000002729) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603454403] (+0.000001269) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603455434] (+0.000001031) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603456680] (+0.000001246) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603457691] (+0.000001011) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603458702] (+0.000001011) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603459698] (+0.000000996) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603460707] (+0.000001009) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[19:15:22.603461716] (+0.000001009) debian-amd64 x86_exceptions_page_fault_kernel: { cpu_id = 0 }, { _callstack_user_length = 2, callstack_user = [ [0] = 0x7F1910BFCF46, [1] = 0x447CE0 ] }, { address = 0x1, ip = 0xFFFFFFFF810735C8, error_code = 0x0 }
[...]
</pre></p>
<p>I found out that the when reserving space in the buffer, the ringbuffer gets<br />stuck in the following compare and exchange loop.</p>
<p>Here is the code in question (lib/ringbuffer/ring_buffer_frontend.c:2045):<br /><pre>
do {
ret = lib_ring_buffer_try_reserve_slow(buf, chan, &offsets,
ctx);
if (unlikely(ret)) {
return ret;
}
} while (unlikely(v_cmpxchg(config, &buf->offset, offsets.old,
offsets.end)
!= offsets.old));
</pre><br />In this loop, we try to reserve space in the buffer and if somebody else<br />changed the buffer writer position during that time we try again.</p>
<p>At the beginning of the execution of lib_ring_buffer_try_reserve_slow the<br />offsets->old local variable is set to the buffer's buf->offset->begin and the<br />cmpxchg checks that those values still match before atomicaly moving the<br />pointer in the v_cmpxchg() in the above code.</p>
<p>The act of trying to reserve space triggers the computation of the size of the<br />callstack context which triggers a kernel page fault. This page fault triggers<br />a tracepoint which moves the write pointer in the buffer. Which makes the<br />cmpxchg fail at the end of the loop and forces a retry.</p>
<p>Here is the ftrace trace of the kernel while it's looping: <a class="external" href="http://paste.ubuntu.com/24658695/">http://paste.ubuntu.com/24658695/</a></p>
<p>Looking at line 1060 that the LTTng tracer started the handling of a<br />sched_switch event. From there it calls lib_ring_buffer_reserve_slow() and<br />loops forever.</p>
<p>The problem is that everytime the tracer needs to reserve space in the ring<br />buffer for the callstack-user context it generates a new event that changes <br />the state of the buffer, which makes the tracer retry in the hope that the <br />buffer won't change this time.</p>
<p>As a possible solution, we could disable the kernel page fault event when the<br />tracer is nested into itself. But this would require significant changes in the<br />tracepoint() macro to add custom condition checking. In this case, we could<br />check the value of the lib_ring_buffer_nesting variable.</p>
<p>[1]: <a class="external" href="https://github.com/compudj/lttng-modules-dev/tree/callstack">https://github.com/compudj/lttng-modules-dev/tree/callstack</a></p> LTTng-modules - Feature #950 (New): Exclude specific kernel events when enabling all of themhttps://bugs.lttng.org/issues/9502015-10-09T22:55:29ZJulien Desfossezjdesfossez@efficios.com
<p>It would be nice to be able to do that (like in UST):<br />lttng enable-event -k -a --exclude sched_switch</p>
<p>My current workaround:<br />lttng enable-event <del>k $(lttng list -k | grep -v -E "Kernel|---</del>" | grep -v sched_switch | awk '{print $1}' | tr "\n" ",")</p>