Project

General

Profile

Bug #1174

yocto 2.6.1(pre-release) + lttng no kprobe tracing?

Added by Robert Berger over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
01/22/2019
Due date:
% Done:

100%

Estimated time:

Description

After we resolved https://bugs.lttng.org/issues/1173#change-3359 I tried with exactly the same setup this:

On the target serial console:

root@multi-v7-ml:/# cp /proc/config.gz .
root@multi-v7-ml:/# gunzip config.gz
root@multi-v7-ml:/# grep -i kprobe config
CONFIG_KPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_ARM_KPROBES_TEST is not set
root@multi-v7-ml:/# export SESSION="kprobe_test_session" 
root@multi-v7-ml:/# lttng create ${SESSION}
Spawning a session daemon
[  209.886591] lttng_kretprobes: loading out-of-tree module taints kernel.
[  210.158768] LTTng: Loaded modules v2.10.8 (KeKriek)

Session kprobe_test_session created.
Traces will be written in /home/root/lttng-traces/kprobe_test_session-20190122-163703
root@multi-v7-ml:/# lttng enable-event --kernel sys_open --probe sys_open+0x0
Kernel event sys_open created in channel channel0
root@multi-v7-ml:/# lttng enable-event --kernel sys_close --probe sys_close+0x0
Kernel event sys_close created in channel channel0
root@multi-v7-ml:/# [  270.977499] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[  270.984229] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O)
[  271.055627]  lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched
[  271.086020] CPU: 1 PID: 566 Comm: crond Tainted: G           O      4.19.13-custom-ml-debug #1
[  271.094645] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  271.101191] PC is at 0xbf3b0194
[  271.104351] LR is at ret_fast_syscall+0x0/0x28
[  271.108810] pc : [<bf3b0194>]    lr : [<c0301000>]    psr: 800a0013
[  271.115090] sp : ecc45f60  ip : 00000006  fp : 00000000
[  271.120326] r10: 00000006  r9 : ecc44000  r8 : c03011e4
[  271.125563] r7 : 00000006  r6 : bed5a5fc  r5 : 0049e540  r4 : 00000000
[  271.132102] r3 : ecc45fa8  r2 : b6f13d18  r1 : b6f13d18  r0 : 00000005
[  271.138643] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  271.145789] Control: 10c5387d  Table: 3cca004a  DAC: 00000051
[  271.151548] Process crond (pid: 566, stack limit = 0xc688ab6f)
[  271.157393] Stack: (0xecc45f60 to 0xecc46000)
[  271.161769] 5f60: b6f13d18 00008029 00000000 0049e540 bed5a5fc 00000006 c03011e4 ecc44000
[  271.169962] 5f80: 00000006 00000000 ecc45f88 ecc45fa8 c0301000 bf3b0180 800a0013 ffffffff
[  271.178154] 5fa0: 00000051 bf000000 00000000 0049e540 00000005 b6f13d18 b6f13d18 00008029
[  271.186347] 5fc0: 00000000 0049e540 bed5a5fc 00000006 0000002f 00409bf0 bed5a790 b6f63900
[  271.194538] 5fe0: 00000040 bed5a594 00403348 b6e9d400 200a0010 00000005 00000000 00000000
[  271.202748] unwind: Index not found bf3b0194
[  271.207036] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634)
[  271.213149] ---[ end trace d4b330dbb17e420b ]---
[  271.217783] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:34
[  271.226667] in_atomic(): 0, irqs_disabled(): 128, pid: 566, name: crond
[  271.233292] INFO: lockdep is turned off.
[  271.237226] irq event stamp: 10974
[  271.240647] hardirqs last  enabled at (10973): [<c0301bb4>] __pabt_svc+0x74/0xa0
[  271.248060] hardirqs last disabled at (10974): [<c0301ae0>] __und_svc+0x60/0x74
[  271.255393] softirqs last  enabled at (10944): [<c106e334>] rpc_wake_up_first_on_wq+0x98/0x194
[  271.264019] softirqs last disabled at (10942): [<c106e2c8>] rpc_wake_up_first_on_wq+0x2c/0x194
[  271.272645] CPU: 1 PID: 566 Comm: crond Tainted: G      D    O      4.19.13-custom-ml-debug #1
[  271.281267] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  271.287822] [<c031726c>] (unwind_backtrace) from [<c030ffe8>] (show_stack+0x20/0x24)
[  271.295590] [<c030ffe8>] (show_stack) from [<c1099014>] (dump_stack+0xb4/0xe0)
[  271.302841] [<c1099014>] (dump_stack) from [<c03870e0>] (___might_sleep+0x1cc/0x2b8)
[  271.310606] [<c03870e0>] (___might_sleep) from [<c038723c>] (__might_sleep+0x70/0xa8)
[  271.318456] [<c038723c>] (__might_sleep) from [<c0369f38>] (exit_signals+0x48/0x2ec)
[  271.326219] [<c0369f38>] (exit_signals) from [<c0359308>] (do_exit+0x10c/0xc08)
[  271.333546] [<c0359308>] (do_exit) from [<c0310324>] (die+0x338/0x344)
[  271.340093] [<c0310324>] (die) from [<c031038c>] (arm_notify_die+0x5c/0x60)
[  271.347072] [<c031038c>] (arm_notify_die) from [<c0310568>] (do_undefinstr+0x1d8/0x2b4)
[  271.355096] [<c0310568>] (do_undefinstr) from [<c0301afc>] (__und_svc_finish+0x0/0x44)
[  271.363025] Exception stack(0xecc45ed0 to 0xecc45f18)
[  271.368091] 5ec0:                                     00000005 b6f13d18 b6f13d18 ecc45fa8
[  271.376284] 5ee0: 00000000 0049e540 bed5a5fc 00000006 c03011e4 ecc44000 00000006 00000000
[  271.384477] 5f00: 00000006 ecc45f60 c0301000 bf3b0194 800a0013 ffffffff
[  271.391110] [<c0301afc>] (__und_svc_finish) from [<bf3b0194>] (0xbf3b0194)
[  271.398004] unwind: Index not found bf3b0194
[  271.403112] Internal error: Oops - undefined instruction: 0 [#2] SMP ARM
[  271.409834] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O)
[  271.481189]  lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched
[  271.511574] CPU: 1 PID: 523 Comm: syslogd Tainted: G      D W  O      4.19.13-custom-ml-debug #1
[  271.520371] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  271.526914] PC is at 0xbf3b0194
[  271.530072] LR is at ret_fast_syscall+0x0/0x28
[  271.534527] pc : [<bf3b0194>]    lr : [<c0301000>]    psr: 80070013
[  271.540804] sp : ec8f7f60  ip : 00000006  fp : 00000000
[  271.546040] r10: 00000006  r9 : ec8f6000  r8 : c03011e4
[  271.551276] r7 : 00000006  r6 : 004b65ac  r5 : 5c4746e9  r4 : 004b6150
[  271.557814] r3 : ec8f7fa8  r2 : 0000006c  r1 : 00000000  r0 : 00000003
[  271.564355] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  271.571501] Control: 10c5387d  Table: 3cb9004a  DAC: 00000051
[  271.577261] Process syslogd (pid: 523, stack limit = 0xc7bf2df9)
[  271.583281] Stack: (0xec8f7f60 to 0xec8f8000)
[  271.587657] 7f60: 0000006c 00000000 004b6150 5c4746e9 004b65ac 00000006 c03011e4 ec8f6000
[  271.595851] 7f80: 00000006 00000000 ec8f7f88 ec8f7fa8 c0301000 bf3b0180 80070013 ffffffff
[  271.604045] 7fa0: 00000051 bf000000 004b6150 5c4746e9 00000003 00000000 0000006c 00000000
[  271.612236] 7fc0: 004b6150 5c4746e9 004b65ac 00000006 b6fd6900 00000000 00000000 bef5ab6c
[  271.620428] 7fe0: 004b4f34 bef5aab0 0044c31c b6e80118 60070010 00000003 00000000 00000000
[  271.628629] unwind: Index not found bf3b0194
[  271.632918] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634)
[  271.639031] ---[ end trace d4b330dbb17e420c ]---
[  271.644275] Internal error: Oops - undefined instruction: 0 [#3] SMP ARM
[  271.650998] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O)
[  271.722366]  lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched
[  271.752744] CPU: 2 PID: 526 Comm: klogd Tainted: G      D W  O      4.19.13-custom-ml-debug #1
[  271.761369] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  271.767910] PC is at 0xbf3b0194
[  271.771067] LR is at ret_fast_syscall+0x0/0x28
[  271.775522] pc : [<bf3b0194>]    lr : [<c0301000>]    psr: 800f0013
[  271.781800] sp : ec8f5f60  ip : 00000006  fp : 00000000
[  271.787037] r10: 00000006  r9 : ec8f4000  r8 : c03011e4
[  271.792275] r7 : 00000006  r6 : 00000014  r5 : 00000001  r4 : b6f257ac
[  271.798813] r3 : ec8f5fa8  r2 : 0000006c  r1 : 0000006f  r0 : 00000003
[  271.805352] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  271.812498] Control: 10c5387d  Table: 3cb9404a  DAC: 00000051
[  271.818256] Process klogd (pid: 526, stack limit = 0x4e2f341e)
[  271.824101] Stack: (0xec8f5f60 to 0xec8f6000)
[  271.828474] 5f60: 0000006c 00000000 b6f257ac 00000001 00000014 00000006 c03011e4 ec8f4000
[  271.836667] 5f80: 00000006 00000000 ec8f5f88 ec8f5fa8 c0301000 bf3b0180 800f0013 ffffffff
[  271.844858] 5fa0: 00000051 bf000000 b6f257ac 00000001 00000003 0000006f 0000006c 00000000
[  271.853050] 5fc0: b6f257ac 00000001 00000014 00000006 00000001 00000000 004bbd80 be91ec4c
[  271.861242] 5fe0: 00000000 be91eb80 b6eb3cd0 b6ea8118 600f0010 00000003 00000000 00000000
[  271.869441] unwind: Index not found bf3b0194
[  271.873728] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634)
[  271.879839] ---[ end trace d4b330dbb17e420d ]---

root@multi-v7-ml:/# 

Files

config.gz (42.5 KB) config.gz config + CONFIG_ARM_MODULE_PLTS=y Robert Berger, 01/22/2019 01:36 PM
#1

Updated by Jonathan Rajotte Julien over 1 year ago

  • Status changed from New to In Progress

Hi Robert,

Were you able to validate that base tracing work? not using probe.

I assume we can base ourselves on https://bugs.lttng.org/issues/1173#note-7 for the kernel config.

We will need to take a deeper look at this. It might take more time than the previous issues. Mathieu Desnoyers might be able to provide us with more information later this week.

Cheers

#2

Updated by Jonathan Rajotte Julien over 1 year ago

We do have hardware in our test lab for Freescale i.MX6 Quad/DualLite target but it would be easier if we can use qemu for testing.

If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.

Cheers

#3

Updated by Robert Berger over 1 year ago

Here are the details:

components used:

poky/branch: https://github.com/RobertBerger/poky/tree/2019-01-16-thud-2.6.1-pre-release

layer, which adds 4.19.13 kernel + new lttng components:

https://github.com/RobertBerger/meta-mainline/tree/thud-training-v4.19.x/multi-v7-ml

kernel:

target# uname -a
Linux multi-v7-ml 4.19.13-custom-ml-debug #1 SMP Wed Jan 16 20:05:50 UTC 2019 armv7l armv7l armv7l GNU/Linux
<pre>

<pre>
target# gcc --version
gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
</pre>

<pre>
host# bitbake-layers show-recipes | grep -A1 lttng
lttng-modules:
  multi-v7-ml          2.10.8
--
lttng-tools:
  multi-v7-ml          2.10.5
--
lttng-ust:
  multi-v7-ml          2:2.10.2
</pre>

The kernel is upstream 4.19.13:

repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Branch: linux-4.19.y
Commit: c04c050f5bf98845bfe22164b8a1503d696a6e26

as you can see here:

https://github.com/RobertBerger/meta-mainline/blob/thud-training-v4.19.x/multi-v7-ml/recipes-kernel/linux/linux-yocto-custom_4.19.13.bb

plus this patch:

https://github.com/RobertBerger/meta-mainline/blob/thud-training-v4.19.x/multi-v7-ml/recipes-kernel/linux/patch/4.19.x/patches/phyBOARD-MIRA/patches-multi-v7-defconfig/0001-revert-patch-e7e73b10d690c5352cb11b1941a09e4f3dc4c8c.patch

The new config is attached here - basically like before plus the one CONFIG_ARM_MODULE_PLTS=y fix.

If you want me to try out things just let me know.

Thanks!
#4

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

We do have hardware in our test lab for Freescale i.MX6 Quad/DualLite target but it would be easier if we can use qemu for testing.

If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.

Cheers

If you build my stuff you'll see that it's a multi_v7_defconfig plus lots of debug stuff enabled in the kernel.
So it should run on i.mx6, sitara, zynq, socfpga,... (ARMV7, which means e.g. Cortex-A8, Cortex-A9,...)

I did not try qemu for quite some time, but it should run with the right device tree on qemu vexpressa9.

Please keep in mind, that for my multi arch kernel (and rootfs) to work on many different boards I use this as a compiler:

DEFAULTTUNE = "armv7a-neon" 
include conf/machine/include/arm/arch-armv7a.inc

Which means pretty much armv7 plus software floating point support.

#5

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.

Cheers

I think you don't even need to change the poky conf file (I guess you mean local.conf).

1) You need to choose a qemu target which works with armv7 (I guess vexpressa9)
2) You need to add the device tree for it
3) There is Python config file for qemu, where you can do further adjustments how qemu should start up.

I'm sorry that I'm not much of a help with qemu, since I prefer real boards.

Having said that my rootfs and kernel should run on qemu with a vexpressa9 target. (Did not try it for quite some time, so I don't know what hacks are needed to make it work).

The default yocto project qemu arm target is arm v5 I believe.

Regards,

Robert

#6

Updated by Robert Berger over 1 year ago

I did not try it myself, but this might give you a headstart:

https://github.com/GENIVI/meta-ivi/blob/master/meta-ivi-bsp/conf/machine/vexpressa9.conf

#7

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Robert,

Just letting you know that, unfortunately, we do not have a lot of time on our end. It would help us greatly if this could be reproduced in a "simpler" way or at least a straightforward way to do it. We are in no way yocto expert (yet), but normally once we can get a reproducer, we can identify the problem quickly.

Cheers.

#8

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Robert,

Could you try placing a similar probe with perf/ftrace directly and validate that it work? Might help us pruning the search tree.

Cheers

#9

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

Hi Robert,

Just letting you know that, unfortunately, we do not have a lot of time on our end. It would help us greatly if this could be reproduced in a "simpler" way or at least a straightforward way to do it. We are in no way yocto expert (yet), but normally once we can get a reproducer, we can identify the problem quickly.

Cheers.

Would it help if I give you remote access

) to an SDK and the lttng problem + a board
) No Yocto involved. I guess I could produce an SDK (toolchain, libs, headers,..) and give you remote access to it.
) You can remotely power on/off the board.
) You can see the serial console.
) kernel+fdt loaded over tftp
*) rootfs over nfs

And of course I can assist and we can also run interactive sessions together to speed up things.
Currently I am on the road, but we can maybe schedule something for next week.

#10

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

Hi Robert,

Could you try placing a similar probe with perf/ftrace directly and validate that it work? Might help us pruning the search tree.

Cheers

1) I can try to set a kprobe on sys_open and sys_close and see what's going on.

2) ftrace:

something like this?

cd /sys/kernel/debug/tracing
cat available_events | grep sys_open
echo sys_enter_open > set_ftrace_filter
...

something like this?

3) perf

perf stat -e raw_syscalls:sys_enter_open -I 1000 -a
#11

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Douglas,

Remote access would be pretty good, as long as we can compile lttng-modules and test easily. Next week can be good for me to at least get the workflow going. We will see from there what we can do to figure out the problem.

We can continue this part of the discussion by email: jonathan.rajotte-julien at efficios.com .

As for testing something like this would be sufficient:

perf probe -a sys_open+0x0
perf record -e probe:sys_open -aR sleep 1

We just need to make sure that we actually use the kprobe mechanism. I'm not sure if the " -e raw_syscalls:sys_enter_open" ends up using kprobe to do the work.

Cheers

#12

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

As for testing something like this would be sufficient:
[...]

seems to work:

ssh root@192.168.42.11
Last login: Tue Jan 22 17:05:41 2019
root@multi-v7-ml:~# perf probe -a sys_open+0x0
Added new event:
  probe:sys_open       (on sys_open)

You can now use it in all perf tools, such as:

    perf record -e probe:sys_open -aR sleep 1

root@multi-v7-ml:~# perf record -e probe:sys_open -aR sleep 1

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.052 MB perf.data ]
root@multi-v7-ml:~# 
root@multi-v7-ml:~# 

#13

Updated by Robert Berger over 1 year ago

No oops:

[   19.635115] EXT4-fs (mmcblk0p1): recovery complete
[   19.641430] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[   22.660370] random: dd: uninitialized urandom read (512 bytes read)
[   23.124330] random: crng init done
[   26.733846] Bluetooth: Core ver 2.22
[   26.737744] NET: Registered protocol family 31
[   26.742322] Bluetooth: HCI device and connection manager initialized
[   26.749113] Bluetooth: HCI socket layer initialized
[   26.754155] Bluetooth: L2CAP socket layer initialized
[   26.759956] Bluetooth: SCO socket layer initialized
root@multi-v7-ml:~# 

#14

Updated by Jonathan Rajotte Julien over 1 year ago

Okai so this most probably on our side.

Are you still be able to provide us with a remote system?

Cheers

#15

Updated by Robert Berger over 1 year ago

Hi,

Jonathan Rajotte Julien wrote:

Okai so this most probably on our side.

Are you still be able to provide us with a remote system?

Cheers

Yes we can do something like that.

At the moment I am trying to prepare the pdfs for a workbook I'll need to be printed 50 times or so for the Embedded World. In the meantime I'll try to set up something for you. I am not sure how I can cross compile your modules, since I usually do that via the Yocto project, but I guess if I provide a cross toolchain and some kernel we can fix that together.

Please stay tuned. I'll let you know.

#16

Updated by Robert Berger over 1 year ago

Here are some more tests I did:

1) I built a 4.19.13 kernel + lttng modules without Yocto, but with a Yocto SDK - the oops seems to be with sys_close and not with sys_open

root@multi-v7-ml:~# perf probe -a sys_close+0x0
Added new event:
  probe:sys_close      (on sys_close)

You can now use it in all perf tools, such as:

        perf record -e probe:sys_close -aR sleep 1

root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1
[  373.843167] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  373.851446] pgd = f6950472
[  373.854203] [00000000] *pgd=00000000
[  373.857847] Internal error: Oops: 805 [#1] SMP ARM
[  373.862654] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs
[  373.872213] CPU: 3 PID: 732 Comm: perf Not tainted 4.19.13-custom-student+ #1
[  373.879360] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  373.885907] PC is at 0xbf10401c
[  373.889070] LR is at ret_fast_syscall+0x0/0x28
[  373.893530] pc : [<bf10401c>]    lr : [<c0301000>]    psr: 400d0013
[  373.899810] sp : eea09f60  ip : eea09f88  fp : 00000000
[  373.905046] r10: 00000006  r9 : eea08000  r8 : c03011e4
[  373.910281] r7 : 00000006  r6 : 00000000  r5 : b6888014  r4 : 008dc770
[  373.916821] r3 : eea09fa8  r2 : 00000000  r1 : 00000444  r0 : 0000000a
[  373.923362] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  373.930510] Control: 10c5387d  Table: 3c9f404a  DAC: 00000051
[  373.936269] Process perf (pid: 732, stack limit = 0xb5a3dfa4)
[  373.942029] Stack: (0xeea09f60 to 0xeea0a000)
[  373.946405] 9f60: be854178 bd07ea02 c1c00f98 c0301100 ffffe000 00000006 eea09fac eea09f88
[  373.954599] 9f80: c045ceb0 c03bd174 008dc770 b6888014 00000000 00000006 200d0010 b6813400
[  373.962792] 9fa0: 00000000 eea09fb0 008dc770 b6888014 0000000a 00000444 b6887bd0 b67b11f0
[  373.970985] 9fc0: 008dc770 b6888014 00000000 00000006 008e2a30 be854178 00000000 be8583cc
[  373.979179] 9fe0: 00000000 be8540e4 b67b2978 b6813400 200d0010 0000000a 00000000 00000000
[  373.987376] unwind: Index not found bf10401c
[  373.991665] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2)
[  373.997875] ---[ end trace 625c7523bca7991d ]---
[  374.003033] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  374.011146] pgd = be934f00
[  374.013914] [00000000] *pgd=00000000
[  374.017540] Internal error: Oops: 805 [#2] SMP ARM
[  374.022352] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs
[  374.031903] CPU: 3 PID: 565 Comm: syslogd Tainted: G      D           4.19.13-custom-student+ #1
[  374.040701] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  374.047247] PC is at 0xbf10401c
[  374.050409] LR is at ret_fast_syscall+0x0/0x28
[  374.054866] pc : [<bf10401c>]    lr : [<c0301000>]    psr: 40070013
[  374.061145] sp : ec011f60  ip : ec011f88  fp : 00000000
[  374.066381] r10: 00000006  r9 : ec010000  r8 : c03011e4
[  374.071617] r7 : 00000006  r6 : 004d55ac  r5 : 5c47afb8  r4 : 004d5150
[  374.078155] r3 : ec011fa8  r2 : 00000000  r1 : 00000000  r0 : 00000003
[  374.084696] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  374.091845] Control: 10c5387d  Table: 3c85c04a  DAC: 00000051
[  374.097606] Process syslogd (pid: 565, stack limit = 0xf8ca94ad)
[  374.103625] Stack: (0xec011f60 to 0xec012000)
[  374.108002] 1f60: c03011e4 bef48ab0 c1c00f98 c0301100 ffffe000 00000006 ec011fac ec011f88
[  374.116194] 1f80: c045ceb0 c03bd174 004d5150 5c47afb8 004d55ac 00000006 60070010 b6e8a118
[  374.124389] 1fa0: 00000000 ec011fb0 004d5150 5c47afb8 00000003 00000000 0000006c 00000000
[  374.132581] 1fc0: 004d5150 5c47afb8 004d55ac 00000006 b6ff9900 00000000 00000001 bef48b6c
[  374.140772] 1fe0: 004d3f34 bef48ab0 0046b31c b6e8a118 60070010 00000003 00000000 00000000
[  374.148968] unwind: Index not found bf10401c
[  374.153258] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2)
[  374.159493] ---[ end trace 625c7523bca7991e ]---
[  374.164750] Unable to handle kernel NULL pointer dereference at virtual address 00000007
[  374.172920] pgd = 2b0540c4
[  374.175643] [00000007] *pgd=00000000
[  374.179243] Internal error: Oops: 801 [#3] SMP ARM
[  374.184048] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs
[  374.193605] CPU: 1 PID: 568 Comm: klogd Tainted: G      D           4.19.13-custom-student+ #1
[  374.202228] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  374.208771] PC is at 0xbf104010
[  374.211930] LR is at ret_fast_syscall+0x0/0x28
[  374.216386] pc : [<bf104010>]    lr : [<c0301000>]    psr: 600f0013
[  374.222663] sp : ecd1bf60  ip : ecd1bf88  fp : 00000000
[  374.227898] r10: 00000006  r9 : ecd1a000  r8 : c03011e4
[  374.233134] r7 : 00000006  r6 : 00000014  r5 : 00000001  r4 : b6ed27ac
[  374.239673] r3 : ecd1bfa8  r2 : 00000000  r1 : 0000006f  r0 : 00000003
[  374.246213] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  374.253360] Control: 10c5387d  Table: 3c91804a  DAC: 00000051
[  374.259119] Process klogd (pid: 568, stack limit = 0xfde5e4f0)
[  374.264963] Stack: (0xecd1bf60 to 0xecd1c000)
[  374.269338] bf60: ecd1bf94 bd07ea02 c1c00f98 c0301100 ffffe000 00000006 ecd1bfac ecd1bf88
[  374.277531] bf80: c045ceb0 c03bd174 b6ed27ac 00000001 00000014 00000006 600f0010 b6e55118
[  374.285724] bfa0: 00000000 ecd1bfb0 b6ed27ac 00000001 00000003 0000006f 0000006c 00000000
[  374.293917] bfc0: b6ed27ac 00000001 00000014 00000006 00000001 00000000 0053fd80 bead2c4c
[  374.302109] bfe0: 00000000 bead2b80 b6e60cd0 b6e55118 600f0010 00000003 00000000 00000000
[  374.310304] unwind: Index not found bf104010
[  374.314593] Code: e24dd048 8e712a42 e28d3048 40112024 (29805028)
[  374.320798] ---[ end trace 625c7523bca7991f ]---
Segmentation fault  
root@multi-v7-ml:~# 
root@multi-v7-ml:~# uname -a
Linux multi-v7-ml 4.19.13-custom-student+ #1 SMP Mon Feb 4 18:09:01 CET 2019 armv7l armv7l armv7l GNU/Linux
root@multi-v7-ml:~# 

So it's not you ;)

#17

Updated by Jonathan Rajotte Julien over 1 year ago

Hi,

Could you make sure that lttng-modules is not loaded at any point?

Lttng-modules might also be responsible if it is doing the same mistake as perf.

Anyhow, at this point, the best bet would be to contact the kprobe maintainers to see if they have any pointer or info regarding a similar problem.

M:    Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
M:    Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
M:    "David S. Miller" <davem@davemloft.net>
M:    Masami Hiramatsu <mhiramat@kernel.org>

Could you cc me (jonathan.rajotte-julien at efficios.com) and Mathieu Desnoyers (mathieu.desnoyers at efficios.com) on the thread?

Also, side comment from Mathieu: it would be a good thing to enable CONFIG_DEBUG_INFO if not already enabled.

Cheers

#18

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

Hi,

Could you make sure that lttng-modules is not loaded at any point?

I built the latest from git, but they are not loaded.

Lttng-modules might also be responsible if it is doing the same mistake as perf.

Anyhow, at this point, the best bet would be to contact the kprobe maintainers to see if they have any pointer or info regarding a similar problem.

[...]

Could you cc me (jonathan.rajotte-julien at efficios.com) and Mathieu Desnoyers (mathieu.desnoyers at efficios.com) on the thread?

Also, side comment from Mathieu: it would be a good thing to enable CONFIG_DEBUG_INFO if not already enabled.

Cheers

I also tried with 4.19.19, same problem:

root@multi-v7-ml:~# lsmod
Module                  Size  Used by
bluetooth             602112  8
ecdh_generic           28672  1 bluetooth
rtc_da9063             16384  0
da9062_wdt             16384  0
da9062_core            16384  0
i2c_imx                28672  0
rtc_snvs               16384  1
root@multi-v7-ml:~#
root@multi-v7-ml:~# perf probe -a sys_close+0x0
Added new event:
  probe:sys_close      (on sys_close)

You can now use it in all perf tools, such as:

        perf record -e probe:sys_close -aR sleep 1

root@multi-v7-ml:~# lsmod
Module                  Size  Used by
bluetooth             602112  8
ecdh_generic           28672  1 bluetooth
rtc_da9063             16384  0
da9062_wdt             16384  0
da9062_core            16384  0
i2c_imx                28672  0
rtc_snvs               16384  1
root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1
[  180.564162] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  180.572459] pgd = fa34b9b1
[  180.575214] [00000000] *pgd=00000000
[  180.578856] Internal error: Oops: 805 [#1] SMP ARM
[  180.583663] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs
[  180.593225] CPU: 2 PID: 730 Comm: perf Not tainted 4.19.19-custom-student+ #1
[  180.600373] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  180.606915] PC is at 0xbf10401c
[  180.610077] LR is at ret_fast_syscall+0x0/0x28
[  180.614535] pc : [<bf10401c>]    lr : [<c0301000>]    psr: 400d0013
[  180.620813] sp : ecffdf60  ip : ecffdf88  fp : 00000000
[  180.626051] r10: 00000006  r9 : ecffc000  r8 : c03011e4
[  180.631287] r7 : 00000006  r6 : 00000000  r5 : b687e014  r4 : 008f7770
[  180.637826] r3 : ecffdfa8  r2 : 00000000  r1 : 00000444  r0 : 0000000a
[  180.644367] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  180.651514] Control: 10c5387d  Table: 3b80004a  DAC: 00000051
[  180.657274] Process perf (pid: 730, stack limit = 0xecc84302)
[  180.663032] Stack: (0xecffdf60 to 0xecffe000)
[  180.667408] df60: bed1d178 bd07ea1c c1c00f98 c0301100 ffffe000 00000006 ecffdfac ecffdf88
[  180.675602] df80: c045cef0 c03bd194 008f7770 b687e014 00000000 00000006 200d0010 b6809400
[  180.683794] dfa0: 00000000 ecffdfb0 008f7770 b687e014 0000000a 00000444 b687dbd0 b67a71f0
[  180.691987] dfc0: 008f7770 b687e014 00000000 00000006 008fda30 bed1d178 00000000 bed213cc
[  180.700179] dfe0: 00000000 bed1d0e4 b67a8978 b6809400 200d0010 0000000a 00000000 00000000
[  180.708377] unwind: Index not found bf10401c
[  180.712665] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2)
[  180.718841] ---[ end trace 3eb25d358167dbe4 ]---
[  180.723917] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  180.732089] pgd = cce87ef1
[  180.734817] [00000000] *pgd=00000000
[  180.738437] Internal error: Oops: 805 [#2] SMP ARM
[  180.743254] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs
[  180.752807] CPU: 2 PID: 572 Comm: syslogd Tainted: G      D           4.19.19-custom-student+ #1
[  180.761605] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  180.768149] PC is at 0xbf10401c
[  180.771308] LR is at ret_fast_syscall+0x0/0x28
[  180.775765] pc : [<bf10401c>]    lr : [<c0301000>]    psr: 40070013
[  180.782043] sp : ecdd7f60  ip : ecdd7f88  fp : 00000000
[  180.787277] r10: 00000006  r9 : ecdd6000  r8 : c03011e4
[  180.792513] r7 : 00000006  r6 : 0058f5ac  r5 : 5c47b398  r4 : 0058f150
[  180.799050] r3 : ecdd7fa8  r2 : 00000000  r1 : 00000000  r0 : 00000003
[  180.805591] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  180.812738] Control: 10c5387d  Table: 3cc7804a  DAC: 00000051
[  180.818498] Process syslogd (pid: 572, stack limit = 0xa1ca5e3b)
[  180.824516] Stack: (0xecdd7f60 to 0xecdd8000)
[  180.828891] 7f60: c03011e4 bec85ab0 c1c00f98 c0301100 ffffe000 00000006 ecdd7fac ecdd7f88
[  180.837083] 7f80: c045cef0 c03bd194 0058f150 5c47b398 0058f5ac 00000006 60070010 b6e3d118
[  180.845277] 7fa0: 00000000 ecdd7fb0 0058f150 5c47b398 00000003 00000000 0000006c 00000000
[  180.853470] 7fc0: 0058f150 5c47b398 0058f5ac 00000006 b6fac900 00000000 00000001 bec85b6c
[  180.861664] 7fe0: 0058df34 bec85ab0 0052531c b6e3d118 60070010 00000003 00000000 00000000
[  180.869859] unwind: Index not found bf10401c
[  180.874149] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2)
[  180.880378] ---[ end trace 3eb25d358167dbe5 ]---
Segmentation fault  
root@multi-v7-ml:~# lsmod
Module                  Size  Used by
bluetooth             602112  8
ecdh_generic           28672  1 bluetooth
rtc_da9063             16384  0
da9062_wdt             16384  0
da9062_core            16384  0
i2c_imx                28672  0
rtc_snvs               16384  1
root@multi-v7-ml:~# uname -a
Linux multi-v7-ml 4.19.19-custom-student+ #1 SMP Mon Feb 4 19:32:04 CET 2019 armv7l armv7l armv7l GNU/Linux
root@multi-v7-ml:~# 
#19

Updated by Robert Berger over 1 year ago

Is there a kprobes mailing list?

Whom should I talk to?

#20

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Robert,

The mailing would be LKML if I'm not mistaken, make sure to cc the kprobe maintainers. (see note 17).

Cheers

#21

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Robert,

Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .

Could you try the fix proposed by Mathieu?

Cheers

#22

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

Hi Robert,

Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .

Could you try the fix proposed by Mathieu?

Cheers

I'm on the case.

Was digging a bit further: https://bugzilla.redhat.com/show_bug.cgi?id=1639397

This was supposed to be fixed:

perf probe -a kernel_vfs_write_return=vfs_write%return

It's oopsing as well.

Will try to add the &

#23

Updated by Robert Berger over 1 year ago

Robert Berger wrote:

Jonathan Rajotte Julien wrote:

Hi Robert,

Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .

Could you try the fix proposed by Mathieu?

Cheers

I'm on the case.

Was digging a bit further: https://bugzilla.redhat.com/show_bug.cgi?id=1639397

This was supposed to be fixed:

[...]

It's oopsing as well.

Will try to add the &

It looks like the & fixed it.

I also added some printk, just in case, to see that it's really called.

Both testcases are fine now:

Poky (Yocto Project Reference Distro) 2.6.1 multi-v7-ml /dev/ttymxc1

multi-v7-ml login: root
root@multi-v7-ml:~# perf probe -a sys_close+0x0
[   86.733765] Killroy was here
Added new event:
  probe:sys_close      (on sys_close)

You can now use it in all perf tools, such as:

        perf record -e probe:sys_close -aR sleep 1

root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.053 MB perf.data (6 samples) ]
root@multi-v7-ml:~# perf probe -a kernel_vfs_write_return=vfs_write%return
[  117.080770] Killroy was here
Added new event:
  probe:kernel_vfs_write_return__return (on vfs_write%return)

You can now use it in all perf tools, such as:

        perf record -e probe:kernel_vfs_write_return__return -aR sleep 1

root@multi-v7-ml:~# perf record -e probe:kernel_vfs_write_return__return -aR sleep 1
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 0.931 MB perf.data (12116 samples) ]
root@multi-v7-ml:~# 
[   22.414152] urandom_read: 2 callbacks suppressed
[   22.414165] random: dd: uninitialized urandom read (512 bytes read)
[   23.542785] random: crng init done
[   27.987496] Bluetooth: Core ver 2.22
[   27.991580] NET: Registered protocol family 31
[   27.996080] Bluetooth: HCI device and connection manager initialized
[   28.002968] Bluetooth: HCI socket layer initialized
[   28.007947] Bluetooth: L2CAP socket layer initialized
[   28.013729] Bluetooth: SCO socket layer initialized
[   86.733765] Killroy was here
[  117.080770] Killroy was here
root@multi-v7-ml:~# 
#24

Updated by Jonathan Rajotte Julien over 1 year ago

Hi Robert,

We will be introducing a blacklist mechanism for the affected kernels. We are waiting for a tag release on the 4.20,4.19,4.14 branch to provide us with an upper bound for the blacklisting.

We will be backporting it to the appropriate stable branch.

Cheers

#25

Updated by Mathieu Desnoyers over 1 year ago

The fix was merged into Linux upstream before 5.0:

commit 0ac569bf6a7983c0c5747d6df8db9dc05bc92b6c
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Tue Feb 5 16:37:40 2019 +0100

    ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction

    commit e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with
    FORTIFY_SOURCE") introduced a regression in optimized kprobes. It
    triggers "invalid instruction" oopses when using kprobes instrumentation
    through lttng and perf. This commit was introduced in kernel v4.20, and
    has been backported to stable kernels 4.19 and 4.14.

    This crash was also reported by Hongzhi Song on the redhat bugzilla
    where the patch was originally introduced.

    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1639397
    Link: https://bugs.lttng.org/issues/1174
    Link: https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com

    Fixes: e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE")
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Reported-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com>
    Tested-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com>
    Acked-by: Kees Cook <keescook@chromium.org>
    Cc: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: William Cohen <wcohen@redhat.com>
    Cc: Laura Abbott <labbott@redhat.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: <stable@vger.kernel.org> # v4.14+
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: patches@armlinux.org.uk
    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

It was backported to the 4.20, 4.19, and 4.14 stable branches. It shows up in the respective branches starting from those tags:

v4.20.13
v4.19.26
v4.14.104

#26

Updated by Jonathan Rajotte Julien over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
#27

Updated by Robert Berger over 1 year ago

Jonathan Rajotte Julien wrote:

Applied in changeset lttng-modules|caa068b55f4fecb477cd370bca394fd4d232a539.

Great!

Another excellent example for my lectures why to use Open Source and why things should go upstream.

Thanks!

Also available in: Atom PDF