Bug #1174
closedyocto 2.6.1(pre-release) + lttng no kprobe tracing?
100%
Description
After we resolved https://bugs.lttng.org/issues/1173#change-3359 I tried with exactly the same setup this:
On the target serial console:
root@multi-v7-ml:/# cp /proc/config.gz . root@multi-v7-ml:/# gunzip config.gz root@multi-v7-ml:/# grep -i kprobe config CONFIG_KPROBES=y CONFIG_HAVE_KPROBES=y CONFIG_KPROBE_EVENTS=y # CONFIG_KPROBES_SANITY_TEST is not set # CONFIG_ARM_KPROBES_TEST is not set
root@multi-v7-ml:/# export SESSION="kprobe_test_session" root@multi-v7-ml:/# lttng create ${SESSION} Spawning a session daemon [ 209.886591] lttng_kretprobes: loading out-of-tree module taints kernel. [ 210.158768] LTTng: Loaded modules v2.10.8 (KeKriek) Session kprobe_test_session created. Traces will be written in /home/root/lttng-traces/kprobe_test_session-20190122-163703
root@multi-v7-ml:/# lttng enable-event --kernel sys_open --probe sys_open+0x0 Kernel event sys_open created in channel channel0 root@multi-v7-ml:/# lttng enable-event --kernel sys_close --probe sys_close+0x0 Kernel event sys_close created in channel channel0
root@multi-v7-ml:/# [ 270.977499] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM [ 270.984229] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) [ 271.055627] lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched [ 271.086020] CPU: 1 PID: 566 Comm: crond Tainted: G O 4.19.13-custom-ml-debug #1 [ 271.094645] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 271.101191] PC is at 0xbf3b0194 [ 271.104351] LR is at ret_fast_syscall+0x0/0x28 [ 271.108810] pc : [<bf3b0194>] lr : [<c0301000>] psr: 800a0013 [ 271.115090] sp : ecc45f60 ip : 00000006 fp : 00000000 [ 271.120326] r10: 00000006 r9 : ecc44000 r8 : c03011e4 [ 271.125563] r7 : 00000006 r6 : bed5a5fc r5 : 0049e540 r4 : 00000000 [ 271.132102] r3 : ecc45fa8 r2 : b6f13d18 r1 : b6f13d18 r0 : 00000005 [ 271.138643] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 271.145789] Control: 10c5387d Table: 3cca004a DAC: 00000051 [ 271.151548] Process crond (pid: 566, stack limit = 0xc688ab6f) [ 271.157393] Stack: (0xecc45f60 to 0xecc46000) [ 271.161769] 5f60: b6f13d18 00008029 00000000 0049e540 bed5a5fc 00000006 c03011e4 ecc44000 [ 271.169962] 5f80: 00000006 00000000 ecc45f88 ecc45fa8 c0301000 bf3b0180 800a0013 ffffffff [ 271.178154] 5fa0: 00000051 bf000000 00000000 0049e540 00000005 b6f13d18 b6f13d18 00008029 [ 271.186347] 5fc0: 00000000 0049e540 bed5a5fc 00000006 0000002f 00409bf0 bed5a790 b6f63900 [ 271.194538] 5fe0: 00000040 bed5a594 00403348 b6e9d400 200a0010 00000005 00000000 00000000 [ 271.202748] unwind: Index not found bf3b0194 [ 271.207036] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634) [ 271.213149] ---[ end trace d4b330dbb17e420b ]--- [ 271.217783] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:34 [ 271.226667] in_atomic(): 0, irqs_disabled(): 128, pid: 566, name: crond [ 271.233292] INFO: lockdep is turned off. [ 271.237226] irq event stamp: 10974 [ 271.240647] hardirqs last enabled at (10973): [<c0301bb4>] __pabt_svc+0x74/0xa0 [ 271.248060] hardirqs last disabled at (10974): [<c0301ae0>] __und_svc+0x60/0x74 [ 271.255393] softirqs last enabled at (10944): [<c106e334>] rpc_wake_up_first_on_wq+0x98/0x194 [ 271.264019] softirqs last disabled at (10942): [<c106e2c8>] rpc_wake_up_first_on_wq+0x2c/0x194 [ 271.272645] CPU: 1 PID: 566 Comm: crond Tainted: G D O 4.19.13-custom-ml-debug #1 [ 271.281267] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 271.287822] [<c031726c>] (unwind_backtrace) from [<c030ffe8>] (show_stack+0x20/0x24) [ 271.295590] [<c030ffe8>] (show_stack) from [<c1099014>] (dump_stack+0xb4/0xe0) [ 271.302841] [<c1099014>] (dump_stack) from [<c03870e0>] (___might_sleep+0x1cc/0x2b8) [ 271.310606] [<c03870e0>] (___might_sleep) from [<c038723c>] (__might_sleep+0x70/0xa8) [ 271.318456] [<c038723c>] (__might_sleep) from [<c0369f38>] (exit_signals+0x48/0x2ec) [ 271.326219] [<c0369f38>] (exit_signals) from [<c0359308>] (do_exit+0x10c/0xc08) [ 271.333546] [<c0359308>] (do_exit) from [<c0310324>] (die+0x338/0x344) [ 271.340093] [<c0310324>] (die) from [<c031038c>] (arm_notify_die+0x5c/0x60) [ 271.347072] [<c031038c>] (arm_notify_die) from [<c0310568>] (do_undefinstr+0x1d8/0x2b4) [ 271.355096] [<c0310568>] (do_undefinstr) from [<c0301afc>] (__und_svc_finish+0x0/0x44) [ 271.363025] Exception stack(0xecc45ed0 to 0xecc45f18) [ 271.368091] 5ec0: 00000005 b6f13d18 b6f13d18 ecc45fa8 [ 271.376284] 5ee0: 00000000 0049e540 bed5a5fc 00000006 c03011e4 ecc44000 00000006 00000000 [ 271.384477] 5f00: 00000006 ecc45f60 c0301000 bf3b0194 800a0013 ffffffff [ 271.391110] [<c0301afc>] (__und_svc_finish) from [<bf3b0194>] (0xbf3b0194) [ 271.398004] unwind: Index not found bf3b0194 [ 271.403112] Internal error: Oops - undefined instruction: 0 [#2] SMP ARM [ 271.409834] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) [ 271.481189] lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched [ 271.511574] CPU: 1 PID: 523 Comm: syslogd Tainted: G D W O 4.19.13-custom-ml-debug #1 [ 271.520371] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 271.526914] PC is at 0xbf3b0194 [ 271.530072] LR is at ret_fast_syscall+0x0/0x28 [ 271.534527] pc : [<bf3b0194>] lr : [<c0301000>] psr: 80070013 [ 271.540804] sp : ec8f7f60 ip : 00000006 fp : 00000000 [ 271.546040] r10: 00000006 r9 : ec8f6000 r8 : c03011e4 [ 271.551276] r7 : 00000006 r6 : 004b65ac r5 : 5c4746e9 r4 : 004b6150 [ 271.557814] r3 : ec8f7fa8 r2 : 0000006c r1 : 00000000 r0 : 00000003 [ 271.564355] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 271.571501] Control: 10c5387d Table: 3cb9004a DAC: 00000051 [ 271.577261] Process syslogd (pid: 523, stack limit = 0xc7bf2df9) [ 271.583281] Stack: (0xec8f7f60 to 0xec8f8000) [ 271.587657] 7f60: 0000006c 00000000 004b6150 5c4746e9 004b65ac 00000006 c03011e4 ec8f6000 [ 271.595851] 7f80: 00000006 00000000 ec8f7f88 ec8f7fa8 c0301000 bf3b0180 80070013 ffffffff [ 271.604045] 7fa0: 00000051 bf000000 004b6150 5c4746e9 00000003 00000000 0000006c 00000000 [ 271.612236] 7fc0: 004b6150 5c4746e9 004b65ac 00000006 b6fd6900 00000000 00000000 bef5ab6c [ 271.620428] 7fe0: 004b4f34 bef5aab0 0044c31c b6e80118 60070010 00000003 00000000 00000000 [ 271.628629] unwind: Index not found bf3b0194 [ 271.632918] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634) [ 271.639031] ---[ end trace d4b330dbb17e420c ]--- [ 271.644275] Internal error: Oops - undefined instruction: 0 [#3] SMP ARM [ 271.650998] Modules linked in: lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_v4l2(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_block(O) lttng_probe_asoc(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O) lttng_ring_buffer_client_mmap_discard(O) [ 271.722366] lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_ftrace(O) lttng_kprobes(O) lttng_clock(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core rtc_snvs flexcan etnaviv gpu_sched [ 271.752744] CPU: 2 PID: 526 Comm: klogd Tainted: G D W O 4.19.13-custom-ml-debug #1 [ 271.761369] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 271.767910] PC is at 0xbf3b0194 [ 271.771067] LR is at ret_fast_syscall+0x0/0x28 [ 271.775522] pc : [<bf3b0194>] lr : [<c0301000>] psr: 800f0013 [ 271.781800] sp : ec8f5f60 ip : 00000006 fp : 00000000 [ 271.787037] r10: 00000006 r9 : ec8f4000 r8 : c03011e4 [ 271.792275] r7 : 00000006 r6 : 00000014 r5 : 00000001 r4 : b6f257ac [ 271.798813] r3 : ec8f5fa8 r2 : 0000006c r1 : 0000006f r0 : 00000003 [ 271.805352] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 271.812498] Control: 10c5387d Table: 3cb9404a DAC: 00000051 [ 271.818256] Process klogd (pid: 526, stack limit = 0x4e2f341e) [ 271.824101] Stack: (0xec8f5f60 to 0xec8f6000) [ 271.828474] 5f60: 0000006c 00000000 b6f257ac 00000001 00000014 00000006 c03011e4 ec8f4000 [ 271.836667] 5f80: 00000006 00000000 ec8f5f88 ec8f5fa8 c0301000 bf3b0180 800f0013 ffffffff [ 271.844858] 5fa0: 00000051 bf000000 b6f257ac 00000001 00000003 0000006f 0000006c 00000000 [ 271.853050] 5fc0: b6f257ac 00000001 00000014 00000006 00000001 00000000 004bbd80 be91ec4c [ 271.861242] 5fe0: 00000000 be91eb80 b6eb3cd0 b6ea8118 600f0010 00000003 00000000 00000000 [ 271.869441] unwind: Index not found bf3b0194 [ 271.873728] Code: 418ac040 e28d3048 01230588 2034a0b2 (76025634) [ 271.879839] ---[ end trace d4b330dbb17e420d ]--- root@multi-v7-ml:/#
Files
Updated by Jonathan Rajotte Julien almost 6 years ago
- Status changed from New to In Progress
Hi Robert,
Were you able to validate that base tracing work? not using probe.
I assume we can base ourselves on https://bugs.lttng.org/issues/1173#note-7 for the kernel config.
We will need to take a deeper look at this. It might take more time than the previous issues. Mathieu Desnoyers might be able to provide us with more information later this week.
Cheers
Updated by Jonathan Rajotte Julien almost 6 years ago
We do have hardware in our test lab for Freescale i.MX6 Quad/DualLite target but it would be easier if we can use qemu for testing.
If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.
Cheers
Updated by Robert Berger almost 6 years ago
Here are the details:
components used:
poky/branch: https://github.com/RobertBerger/poky/tree/2019-01-16-thud-2.6.1-pre-release
layer, which adds 4.19.13 kernel + new lttng components:
https://github.com/RobertBerger/meta-mainline/tree/thud-training-v4.19.x/multi-v7-ml
kernel:
target# uname -a Linux multi-v7-ml 4.19.13-custom-ml-debug #1 SMP Wed Jan 16 20:05:50 UTC 2019 armv7l armv7l armv7l GNU/Linux <pre> <pre> target# gcc --version gcc (GCC) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. </pre> <pre> host# bitbake-layers show-recipes | grep -A1 lttng lttng-modules: multi-v7-ml 2.10.8 -- lttng-tools: multi-v7-ml 2.10.5 -- lttng-ust: multi-v7-ml 2:2.10.2 </pre> The kernel is upstream 4.19.13: repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Branch: linux-4.19.y Commit: c04c050f5bf98845bfe22164b8a1503d696a6e26 as you can see here: https://github.com/RobertBerger/meta-mainline/blob/thud-training-v4.19.x/multi-v7-ml/recipes-kernel/linux/linux-yocto-custom_4.19.13.bb plus this patch: https://github.com/RobertBerger/meta-mainline/blob/thud-training-v4.19.x/multi-v7-ml/recipes-kernel/linux/patch/4.19.x/patches/phyBOARD-MIRA/patches-multi-v7-defconfig/0001-revert-patch-e7e73b10d690c5352cb11b1941a09e4f3dc4c8c.patch The new config is attached here - basically like before plus the one CONFIG_ARM_MODULE_PLTS=y fix. If you want me to try out things just let me know. Thanks!
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
We do have hardware in our test lab for Freescale i.MX6 Quad/DualLite target but it would be easier if we can use qemu for testing.
If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.
Cheers
If you build my stuff you'll see that it's a multi_v7_defconfig plus lots of debug stuff enabled in the kernel.
So it should run on i.mx6, sitara, zynq, socfpga,... (ARMV7, which means e.g. Cortex-A8, Cortex-A9,...)
I did not try qemu for quite some time, but it should run with the right device tree on qemu vexpressa9.
Please keep in mind, that for my multi arch kernel (and rootfs) to work on many different boards I use this as a compiler:
DEFAULTTUNE = "armv7a-neon" include conf/machine/include/arm/arch-armv7a.inc
Which means pretty much armv7 plus software floating point support.
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
If we base ourselves on your poky branch would we be able to build a qemu-arm image and use qemu for testing? Given that we modify the poky conf file.
Cheers
I think you don't even need to change the poky conf file (I guess you mean local.conf).
1) You need to choose a qemu target which works with armv7 (I guess vexpressa9)
2) You need to add the device tree for it
3) There is Python config file for qemu, where you can do further adjustments how qemu should start up.
I'm sorry that I'm not much of a help with qemu, since I prefer real boards.
Having said that my rootfs and kernel should run on qemu with a vexpressa9 target. (Did not try it for quite some time, so I don't know what hacks are needed to make it work).
The default yocto project qemu arm target is arm v5 I believe.
Regards,
Robert
Updated by Robert Berger almost 6 years ago
I did not try it myself, but this might give you a headstart:
https://github.com/GENIVI/meta-ivi/blob/master/meta-ivi-bsp/conf/machine/vexpressa9.conf
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Robert,
Just letting you know that, unfortunately, we do not have a lot of time on our end. It would help us greatly if this could be reproduced in a "simpler" way or at least a straightforward way to do it. We are in no way yocto expert (yet), but normally once we can get a reproducer, we can identify the problem quickly.
Cheers.
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Robert,
Could you try placing a similar probe with perf/ftrace directly and validate that it work? Might help us pruning the search tree.
Cheers
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
Hi Robert,
Just letting you know that, unfortunately, we do not have a lot of time on our end. It would help us greatly if this could be reproduced in a "simpler" way or at least a straightforward way to do it. We are in no way yocto expert (yet), but normally once we can get a reproducer, we can identify the problem quickly.
Cheers.
Would it help if I give you remote access
) to an SDK and the lttng problem + a board
) No Yocto involved. I guess I could produce an SDK (toolchain, libs, headers,..) and give you remote access to it.
) You can remotely power on/off the board.
) You can see the serial console.
) kernel+fdt loaded over tftp
*) rootfs over nfs
And of course I can assist and we can also run interactive sessions together to speed up things.
Currently I am on the road, but we can maybe schedule something for next week.
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
Hi Robert,
Could you try placing a similar probe with perf/ftrace directly and validate that it work? Might help us pruning the search tree.
Cheers
1) I can try to set a kprobe on sys_open and sys_close and see what's going on.
2) ftrace:
something like this?
cd /sys/kernel/debug/tracing cat available_events | grep sys_open echo sys_enter_open > set_ftrace_filter ...
something like this?
3) perf
perf stat -e raw_syscalls:sys_enter_open -I 1000 -a
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Douglas,
Remote access would be pretty good, as long as we can compile lttng-modules and test easily. Next week can be good for me to at least get the workflow going. We will see from there what we can do to figure out the problem.
We can continue this part of the discussion by email: jonathan.rajotte-julien at efficios.com .
As for testing something like this would be sufficient:
perf probe -a sys_open+0x0 perf record -e probe:sys_open -aR sleep 1
We just need to make sure that we actually use the kprobe mechanism. I'm not sure if the " -e raw_syscalls:sys_enter_open" ends up using kprobe to do the work.
Cheers
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
As for testing something like this would be sufficient:
[...]
seems to work:
ssh root@192.168.42.11 Last login: Tue Jan 22 17:05:41 2019 root@multi-v7-ml:~# perf probe -a sys_open+0x0 Added new event: probe:sys_open (on sys_open) You can now use it in all perf tools, such as: perf record -e probe:sys_open -aR sleep 1 root@multi-v7-ml:~# perf record -e probe:sys_open -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.052 MB perf.data ] root@multi-v7-ml:~# root@multi-v7-ml:~#
Updated by Robert Berger almost 6 years ago
No oops:
[ 19.635115] EXT4-fs (mmcblk0p1): recovery complete [ 19.641430] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) [ 22.660370] random: dd: uninitialized urandom read (512 bytes read) [ 23.124330] random: crng init done [ 26.733846] Bluetooth: Core ver 2.22 [ 26.737744] NET: Registered protocol family 31 [ 26.742322] Bluetooth: HCI device and connection manager initialized [ 26.749113] Bluetooth: HCI socket layer initialized [ 26.754155] Bluetooth: L2CAP socket layer initialized [ 26.759956] Bluetooth: SCO socket layer initialized root@multi-v7-ml:~#
Updated by Jonathan Rajotte Julien almost 6 years ago
Okai so this most probably on our side.
Are you still be able to provide us with a remote system?
Cheers
Updated by Robert Berger almost 6 years ago
Hi,
Jonathan Rajotte Julien wrote:
Okai so this most probably on our side.
Are you still be able to provide us with a remote system?
Cheers
Yes we can do something like that.
At the moment I am trying to prepare the pdfs for a workbook I'll need to be printed 50 times or so for the Embedded World. In the meantime I'll try to set up something for you. I am not sure how I can cross compile your modules, since I usually do that via the Yocto project, but I guess if I provide a cross toolchain and some kernel we can fix that together.
Please stay tuned. I'll let you know.
Updated by Robert Berger almost 6 years ago
Here are some more tests I did:
1) I built a 4.19.13 kernel + lttng modules without Yocto, but with a Yocto SDK - the oops seems to be with sys_close and not with sys_open
root@multi-v7-ml:~# perf probe -a sys_close+0x0 Added new event: probe:sys_close (on sys_close) You can now use it in all perf tools, such as: perf record -e probe:sys_close -aR sleep 1 root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1 [ 373.843167] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 373.851446] pgd = f6950472 [ 373.854203] [00000000] *pgd=00000000 [ 373.857847] Internal error: Oops: 805 [#1] SMP ARM [ 373.862654] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs [ 373.872213] CPU: 3 PID: 732 Comm: perf Not tainted 4.19.13-custom-student+ #1 [ 373.879360] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 373.885907] PC is at 0xbf10401c [ 373.889070] LR is at ret_fast_syscall+0x0/0x28 [ 373.893530] pc : [<bf10401c>] lr : [<c0301000>] psr: 400d0013 [ 373.899810] sp : eea09f60 ip : eea09f88 fp : 00000000 [ 373.905046] r10: 00000006 r9 : eea08000 r8 : c03011e4 [ 373.910281] r7 : 00000006 r6 : 00000000 r5 : b6888014 r4 : 008dc770 [ 373.916821] r3 : eea09fa8 r2 : 00000000 r1 : 00000444 r0 : 0000000a [ 373.923362] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 373.930510] Control: 10c5387d Table: 3c9f404a DAC: 00000051 [ 373.936269] Process perf (pid: 732, stack limit = 0xb5a3dfa4) [ 373.942029] Stack: (0xeea09f60 to 0xeea0a000) [ 373.946405] 9f60: be854178 bd07ea02 c1c00f98 c0301100 ffffe000 00000006 eea09fac eea09f88 [ 373.954599] 9f80: c045ceb0 c03bd174 008dc770 b6888014 00000000 00000006 200d0010 b6813400 [ 373.962792] 9fa0: 00000000 eea09fb0 008dc770 b6888014 0000000a 00000444 b6887bd0 b67b11f0 [ 373.970985] 9fc0: 008dc770 b6888014 00000000 00000006 008e2a30 be854178 00000000 be8583cc [ 373.979179] 9fe0: 00000000 be8540e4 b67b2978 b6813400 200d0010 0000000a 00000000 00000000 [ 373.987376] unwind: Index not found bf10401c [ 373.991665] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2) [ 373.997875] ---[ end trace 625c7523bca7991d ]--- [ 374.003033] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 374.011146] pgd = be934f00 [ 374.013914] [00000000] *pgd=00000000 [ 374.017540] Internal error: Oops: 805 [#2] SMP ARM [ 374.022352] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs [ 374.031903] CPU: 3 PID: 565 Comm: syslogd Tainted: G D 4.19.13-custom-student+ #1 [ 374.040701] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 374.047247] PC is at 0xbf10401c [ 374.050409] LR is at ret_fast_syscall+0x0/0x28 [ 374.054866] pc : [<bf10401c>] lr : [<c0301000>] psr: 40070013 [ 374.061145] sp : ec011f60 ip : ec011f88 fp : 00000000 [ 374.066381] r10: 00000006 r9 : ec010000 r8 : c03011e4 [ 374.071617] r7 : 00000006 r6 : 004d55ac r5 : 5c47afb8 r4 : 004d5150 [ 374.078155] r3 : ec011fa8 r2 : 00000000 r1 : 00000000 r0 : 00000003 [ 374.084696] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 374.091845] Control: 10c5387d Table: 3c85c04a DAC: 00000051 [ 374.097606] Process syslogd (pid: 565, stack limit = 0xf8ca94ad) [ 374.103625] Stack: (0xec011f60 to 0xec012000) [ 374.108002] 1f60: c03011e4 bef48ab0 c1c00f98 c0301100 ffffe000 00000006 ec011fac ec011f88 [ 374.116194] 1f80: c045ceb0 c03bd174 004d5150 5c47afb8 004d55ac 00000006 60070010 b6e8a118 [ 374.124389] 1fa0: 00000000 ec011fb0 004d5150 5c47afb8 00000003 00000000 0000006c 00000000 [ 374.132581] 1fc0: 004d5150 5c47afb8 004d55ac 00000006 b6ff9900 00000000 00000001 bef48b6c [ 374.140772] 1fe0: 004d3f34 bef48ab0 0046b31c b6e8a118 60070010 00000003 00000000 00000000 [ 374.148968] unwind: Index not found bf10401c [ 374.153258] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2) [ 374.159493] ---[ end trace 625c7523bca7991e ]--- [ 374.164750] Unable to handle kernel NULL pointer dereference at virtual address 00000007 [ 374.172920] pgd = 2b0540c4 [ 374.175643] [00000007] *pgd=00000000 [ 374.179243] Internal error: Oops: 801 [#3] SMP ARM [ 374.184048] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs [ 374.193605] CPU: 1 PID: 568 Comm: klogd Tainted: G D 4.19.13-custom-student+ #1 [ 374.202228] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 374.208771] PC is at 0xbf104010 [ 374.211930] LR is at ret_fast_syscall+0x0/0x28 [ 374.216386] pc : [<bf104010>] lr : [<c0301000>] psr: 600f0013 [ 374.222663] sp : ecd1bf60 ip : ecd1bf88 fp : 00000000 [ 374.227898] r10: 00000006 r9 : ecd1a000 r8 : c03011e4 [ 374.233134] r7 : 00000006 r6 : 00000014 r5 : 00000001 r4 : b6ed27ac [ 374.239673] r3 : ecd1bfa8 r2 : 00000000 r1 : 0000006f r0 : 00000003 [ 374.246213] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 374.253360] Control: 10c5387d Table: 3c91804a DAC: 00000051 [ 374.259119] Process klogd (pid: 568, stack limit = 0xfde5e4f0) [ 374.264963] Stack: (0xecd1bf60 to 0xecd1c000) [ 374.269338] bf60: ecd1bf94 bd07ea02 c1c00f98 c0301100 ffffe000 00000006 ecd1bfac ecd1bf88 [ 374.277531] bf80: c045ceb0 c03bd174 b6ed27ac 00000001 00000014 00000006 600f0010 b6e55118 [ 374.285724] bfa0: 00000000 ecd1bfb0 b6ed27ac 00000001 00000003 0000006f 0000006c 00000000 [ 374.293917] bfc0: b6ed27ac 00000001 00000014 00000006 00000001 00000000 0053fd80 bead2c4c [ 374.302109] bfe0: 00000000 bead2b80 b6e60cd0 b6e55118 600f0010 00000003 00000000 00000000 [ 374.310304] unwind: Index not found bf104010 [ 374.314593] Code: e24dd048 8e712a42 e28d3048 40112024 (29805028) [ 374.320798] ---[ end trace 625c7523bca7991f ]--- Segmentation fault root@multi-v7-ml:~# root@multi-v7-ml:~# uname -a Linux multi-v7-ml 4.19.13-custom-student+ #1 SMP Mon Feb 4 18:09:01 CET 2019 armv7l armv7l armv7l GNU/Linux root@multi-v7-ml:~#
So it's not you ;)
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi,
Could you make sure that lttng-modules is not loaded at any point?
Lttng-modules might also be responsible if it is doing the same mistake as perf.
Anyhow, at this point, the best bet would be to contact the kprobe maintainers to see if they have any pointer or info regarding a similar problem.
M: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> M: "David S. Miller" <davem@davemloft.net> M: Masami Hiramatsu <mhiramat@kernel.org>
Could you cc me (jonathan.rajotte-julien at efficios.com) and Mathieu Desnoyers (mathieu.desnoyers at efficios.com) on the thread?
Also, side comment from Mathieu: it would be a good thing to enable CONFIG_DEBUG_INFO if not already enabled.
Cheers
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
Hi,
Could you make sure that lttng-modules is not loaded at any point?
I built the latest from git, but they are not loaded.
Lttng-modules might also be responsible if it is doing the same mistake as perf.
Anyhow, at this point, the best bet would be to contact the kprobe maintainers to see if they have any pointer or info regarding a similar problem.
[...]
Could you cc me (jonathan.rajotte-julien at efficios.com) and Mathieu Desnoyers (mathieu.desnoyers at efficios.com) on the thread?
Also, side comment from Mathieu: it would be a good thing to enable CONFIG_DEBUG_INFO if not already enabled.
Cheers
I also tried with 4.19.19, same problem:
root@multi-v7-ml:~# lsmod Module Size Used by bluetooth 602112 8 ecdh_generic 28672 1 bluetooth rtc_da9063 16384 0 da9062_wdt 16384 0 da9062_core 16384 0 i2c_imx 28672 0 rtc_snvs 16384 1 root@multi-v7-ml:~# root@multi-v7-ml:~# perf probe -a sys_close+0x0 Added new event: probe:sys_close (on sys_close) You can now use it in all perf tools, such as: perf record -e probe:sys_close -aR sleep 1 root@multi-v7-ml:~# lsmod Module Size Used by bluetooth 602112 8 ecdh_generic 28672 1 bluetooth rtc_da9063 16384 0 da9062_wdt 16384 0 da9062_core 16384 0 i2c_imx 28672 0 rtc_snvs 16384 1 root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1 [ 180.564162] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 180.572459] pgd = fa34b9b1 [ 180.575214] [00000000] *pgd=00000000 [ 180.578856] Internal error: Oops: 805 [#1] SMP ARM [ 180.583663] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs [ 180.593225] CPU: 2 PID: 730 Comm: perf Not tainted 4.19.19-custom-student+ #1 [ 180.600373] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 180.606915] PC is at 0xbf10401c [ 180.610077] LR is at ret_fast_syscall+0x0/0x28 [ 180.614535] pc : [<bf10401c>] lr : [<c0301000>] psr: 400d0013 [ 180.620813] sp : ecffdf60 ip : ecffdf88 fp : 00000000 [ 180.626051] r10: 00000006 r9 : ecffc000 r8 : c03011e4 [ 180.631287] r7 : 00000006 r6 : 00000000 r5 : b687e014 r4 : 008f7770 [ 180.637826] r3 : ecffdfa8 r2 : 00000000 r1 : 00000444 r0 : 0000000a [ 180.644367] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 180.651514] Control: 10c5387d Table: 3b80004a DAC: 00000051 [ 180.657274] Process perf (pid: 730, stack limit = 0xecc84302) [ 180.663032] Stack: (0xecffdf60 to 0xecffe000) [ 180.667408] df60: bed1d178 bd07ea1c c1c00f98 c0301100 ffffe000 00000006 ecffdfac ecffdf88 [ 180.675602] df80: c045cef0 c03bd194 008f7770 b687e014 00000000 00000006 200d0010 b6809400 [ 180.683794] dfa0: 00000000 ecffdfb0 008f7770 b687e014 0000000a 00000444 b687dbd0 b67a71f0 [ 180.691987] dfc0: 008f7770 b687e014 00000000 00000006 008fda30 bed1d178 00000000 bed213cc [ 180.700179] dfe0: 00000000 bed1d0e4 b67a8978 b6809400 200d0010 0000000a 00000000 00000000 [ 180.708377] unwind: Index not found bf10401c [ 180.712665] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2) [ 180.718841] ---[ end trace 3eb25d358167dbe4 ]--- [ 180.723917] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 180.732089] pgd = cce87ef1 [ 180.734817] [00000000] *pgd=00000000 [ 180.738437] Internal error: Oops: 805 [#2] SMP ARM [ 180.743254] Modules linked in: bluetooth ecdh_generic rtc_da9063 da9062_wdt da9062_core i2c_imx rtc_snvs [ 180.752807] CPU: 2 PID: 572 Comm: syslogd Tainted: G D 4.19.19-custom-student+ #1 [ 180.761605] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 180.768149] PC is at 0xbf10401c [ 180.771308] LR is at ret_fast_syscall+0x0/0x28 [ 180.775765] pc : [<bf10401c>] lr : [<c0301000>] psr: 40070013 [ 180.782043] sp : ecdd7f60 ip : ecdd7f88 fp : 00000000 [ 180.787277] r10: 00000006 r9 : ecdd6000 r8 : c03011e4 [ 180.792513] r7 : 00000006 r6 : 0058f5ac r5 : 5c47b398 r4 : 0058f150 [ 180.799050] r3 : ecdd7fa8 r2 : 00000000 r1 : 00000000 r0 : 00000003 [ 180.805591] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 180.812738] Control: 10c5387d Table: 3cc7804a DAC: 00000051 [ 180.818498] Process syslogd (pid: 572, stack limit = 0xa1ca5e3b) [ 180.824516] Stack: (0xecdd7f60 to 0xecdd8000) [ 180.828891] 7f60: c03011e4 bec85ab0 c1c00f98 c0301100 ffffe000 00000006 ecdd7fac ecdd7f88 [ 180.837083] 7f80: c045cef0 c03bd194 0058f150 5c47b398 0058f5ac 00000006 60070010 b6e3d118 [ 180.845277] 7fa0: 00000000 ecdd7fb0 0058f150 5c47b398 00000003 00000000 0000006c 00000000 [ 180.853470] 7fc0: 0058f150 5c47b398 0058f5ac 00000006 b6fac900 00000000 00000001 bec85b6c [ 180.861664] 7fe0: 0058df34 bec85ab0 0052531c b6e3d118 60070010 00000003 00000000 00000000 [ 180.869859] unwind: Index not found bf10401c [ 180.874149] Code: 40112024 29805028 6014288a 20ceb280 (500b21b2) [ 180.880378] ---[ end trace 3eb25d358167dbe5 ]--- Segmentation fault root@multi-v7-ml:~# lsmod Module Size Used by bluetooth 602112 8 ecdh_generic 28672 1 bluetooth rtc_da9063 16384 0 da9062_wdt 16384 0 da9062_core 16384 0 i2c_imx 28672 0 rtc_snvs 16384 1 root@multi-v7-ml:~# uname -a Linux multi-v7-ml 4.19.19-custom-student+ #1 SMP Mon Feb 4 19:32:04 CET 2019 armv7l armv7l armv7l GNU/Linux root@multi-v7-ml:~#
Updated by Robert Berger almost 6 years ago
Is there a kprobes mailing list?
Whom should I talk to?
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Robert,
The mailing would be LKML if I'm not mistaken, make sure to cc the kprobe maintainers. (see note 17).
Cheers
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Robert,
Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .
Could you try the fix proposed by Mathieu?
Cheers
Updated by Robert Berger almost 6 years ago
Jonathan Rajotte Julien wrote:
Hi Robert,
Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .
Could you try the fix proposed by Mathieu?
Cheers
I'm on the case.
Was digging a bit further: https://bugzilla.redhat.com/show_bug.cgi?id=1639397
This was supposed to be fixed:
perf probe -a kernel_vfs_write_return=vfs_write%return
It's oopsing as well.
Will try to add the &
Updated by Robert Berger almost 6 years ago
Robert Berger wrote:
Jonathan Rajotte Julien wrote:
Hi Robert,
Mathieu seems to have identified the possible source of the problem. See https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com/ .
Could you try the fix proposed by Mathieu?
Cheers
I'm on the case.
Was digging a bit further: https://bugzilla.redhat.com/show_bug.cgi?id=1639397
This was supposed to be fixed:
[...]
It's oopsing as well.
Will try to add the &
It looks like the & fixed it.
I also added some printk, just in case, to see that it's really called.
Both testcases are fine now:
Poky (Yocto Project Reference Distro) 2.6.1 multi-v7-ml /dev/ttymxc1 multi-v7-ml login: root root@multi-v7-ml:~# perf probe -a sys_close+0x0 [ 86.733765] Killroy was here Added new event: probe:sys_close (on sys_close) You can now use it in all perf tools, such as: perf record -e probe:sys_close -aR sleep 1 root@multi-v7-ml:~# perf record -e probe:sys_close -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.053 MB perf.data (6 samples) ] root@multi-v7-ml:~# perf probe -a kernel_vfs_write_return=vfs_write%return [ 117.080770] Killroy was here Added new event: probe:kernel_vfs_write_return__return (on vfs_write%return) You can now use it in all perf tools, such as: perf record -e probe:kernel_vfs_write_return__return -aR sleep 1 root@multi-v7-ml:~# perf record -e probe:kernel_vfs_write_return__return -aR sleep 1 [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.931 MB perf.data (12116 samples) ] root@multi-v7-ml:~#
[ 22.414152] urandom_read: 2 callbacks suppressed [ 22.414165] random: dd: uninitialized urandom read (512 bytes read) [ 23.542785] random: crng init done [ 27.987496] Bluetooth: Core ver 2.22 [ 27.991580] NET: Registered protocol family 31 [ 27.996080] Bluetooth: HCI device and connection manager initialized [ 28.002968] Bluetooth: HCI socket layer initialized [ 28.007947] Bluetooth: L2CAP socket layer initialized [ 28.013729] Bluetooth: SCO socket layer initialized [ 86.733765] Killroy was here [ 117.080770] Killroy was here root@multi-v7-ml:~#
Updated by Jonathan Rajotte Julien almost 6 years ago
Hi Robert,
We will be introducing a blacklist mechanism for the affected kernels. We are waiting for a tag release on the 4.20,4.19,4.14 branch to provide us with an upper bound for the blacklisting.
We will be backporting it to the appropriate stable branch.
Cheers
Updated by Mathieu Desnoyers over 5 years ago
The fix was merged into Linux upstream before 5.0:
commit 0ac569bf6a7983c0c5747d6df8db9dc05bc92b6c Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Date: Tue Feb 5 16:37:40 2019 +0100 ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction commit e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") introduced a regression in optimized kprobes. It triggers "invalid instruction" oopses when using kprobes instrumentation through lttng and perf. This commit was introduced in kernel v4.20, and has been backported to stable kernels 4.19 and 4.14. This crash was also reported by Hongzhi Song on the redhat bugzilla where the patch was originally introduced. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1639397 Link: https://bugs.lttng.org/issues/1174 Link: https://lore.kernel.org/lkml/342740659.2887.1549307721609.JavaMail.zimbra@efficios.com Fixes: e46daee53bb5 ("ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Tested-by: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Robert Berger <Robert.Berger@ReliableEmbeddedSystems.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: William Cohen <wcohen@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> # v4.14+ Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
It was backported to the 4.20, 4.19, and 4.14 stable branches. It shows up in the respective branches starting from those tags:
v4.20.13
v4.19.26
v4.14.104
Updated by Jonathan Rajotte Julien over 5 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset lttng-modules|caa068b55f4fecb477cd370bca394fd4d232a539.
Updated by Robert Berger over 5 years ago
Jonathan Rajotte Julien wrote:
Applied in changeset lttng-modules|caa068b55f4fecb477cd370bca394fd4d232a539.
Great!
Another excellent example for my lectures why to use Open Source and why things should go upstream.
Thanks!