Bug #646
closedcrash when trying to take snapshot
Added by Matthew Khouzam about 11 years ago. Updated over 10 years ago.
0%
Description
kernel: 3.8.0
3.8.0-30-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
The crash occurs when I try to take a kernel snapshot.
lttng 2.3.0 from ubuntu ppas
so.
lttng create --snapshot
lttng enable -a -k
lttng enable -a -u
lttng start
lttng snapshot record
=====CRASH!=====
It happened twice so far, I'm now testing with 3.8.0.31
Files
Updated by Anonymous about 11 years ago
Was it with 3.8.0-30.43 or 3.8.0-30.44? You can check the package version with "apt-cache policy <package_name>".
Updated by Mathieu Desnoyers about 11 years ago
- Status changed from New to Feedback
Can this be reproduced with a vanilla Linux kernel, or is it specific to one Ubuntu kernel sub-version ?
Updated by Matthew Khouzam about 11 years ago
I can confirm it happens more often now, we just need to snapshot often
proof: http://imgur.com/EG2wPNU
Updated by Matthew Khouzam about 11 years ago
I have not tried with vanilla kernels, I am only using Ubuntu.
right now I am using
uname --all
Linux moya 3.8.0-32-generic #47-Ubuntu SMP Tue Oct 1 22:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Updated by Matthew Khouzam about 11 years ago
Matthew Khouzam wrote:
I can confirm it happens more often now, we just need to snapshot often
proof:
Updated by Matthew Khouzam about 11 years ago
To achieve this more reliably I was in eclipse with a remote sshed connection. I had 3 sessions going and was round robin snapshotting them. to do that in eclipse, highlight the 3 sessions and click on the camera button often.
Updated by Mathieu Desnoyers about 11 years ago
- Status changed from Feedback to In Progress
Updated by Matthew Khouzam about 11 years ago
LTTng kernel modules are now lttng-modules-dkms:amd64 (2.4~pre-0+bzr545+pack17+201310151711~ubuntu13.04.1, 2.4~pre-0+bzr547+pack17+201310281547~ubuntu13.04.1) according to my install log.
Hope this helps
Updated by Anonymous about 11 years ago
As we mentioned offline, you can use this page:
https://code.launchpad.net/~lttng/lttng-modules/trunk
to map which upstream commit "bzr547" represents.
But also, we'd need to debug if those modules are actually loaded, it seems DKMS doesn't always want to overwrite the lttng-modules that come with the Ubuntu kernel.
Updated by Mathieu Desnoyers about 11 years ago
- Assignee set to Jérémie Galarneau
Updated by Francis Giraldeau almost 11 years ago
I confirm this bug with kernel 3.8.0-31-generic, lttng (LTTng Trace Control) 2.3.0 - Dominus Vobiscum and the latest HEAD lttng-modules. The crash occurs in register_cpu_notifier() and also from unregister_cpu_notifier().
[11150.403985] BUG: unable to handle kernel paging request at 0000000000001010 [11150.404041] IP: [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40 [11150.404075] PGD 0 [11150.404095] Oops: 0000 [#1] SMP [11150.404120] Modules linked in: lttng_probe_writeback(OF) lttng_probe_workqueue(OF) lttng_probe_vmscan(OF) lttng_probe_udp(OF) lttng_probe_timer(OF) lttng_probe_sunrpc(OF) lttng_probe_statedump(OF) lttng_probe_sock(OF) lttng_probe_skb(OF) lttng_probe_signal(OF) lttng_probe_scsi(OF) lttng_probe_sched(OF) lttng_probe_rpm(OF) lttng_probe_regulator(OF) lttng_probe_regmap(OF) lttng_probe_rcu(OF) lttng_probe_random(OF) lttng_probe_printk(OF) lttng_probe_power(OF) lttng_probe_net(OF) lttng_probe_napi(OF) lttng_probe_module(OF) lttng_probe_kvm(OF) lttng_probe_kmem(OF) lttng_probe_jbd2(OF) lttng_probe_jbd(OF) lttng_probe_irq(OF) lttng_probe_gpio(OF) lttng_probe_compaction(OF) lttng_probe_block(OF) lttng_probe_asoc(OF) lttng_types(OF) lttng_ring_buffer_metadata_mmap_client(OF) lttng_ring_buffer_client_mmap_overwrite(OF) lttng_ring_buffer_client_mmap_discard(OF) lttng_ring_buffer_metadata_client(OF) lttng_ring_buffer_client_overwrite(OF) lttng_ring_buffer_client_discard(OF) lttng_tracer(OF) lttng_statedump(OF) lttng_ftrace(OF) lttng_kprobes(OF) lttng_lib_ring_buffer(OF) lttng_kretprobes(OF) ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) nf_conntrack(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) nvidia(POF) rfcomm bnep bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep(F) snd_pcm(F) snd_page_alloc(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F) snd_seq_device(F) kvm_intel(F) snd_timer(F) kvm(F) mei snd(F) mac_hid wmi microcode(F) soundcore(F) ppdev(F) lpc_ich parport_pc(F) binfmt_misc(F) w83627ehf hwmon_vid coretemp lp(F) ext2(F) parport(F) btrfs(F) zlib_deflate(F) libcrc32c(F) dm_crypt(F) usb_storage(F) hid_generic video(F) usbhid ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) hid cryptd(F) ahci(F) libahci(F) e1000e(F) [last unloaded: lttng_statedump] [11150.410227] CPU 2 [11150.410235] Pid: 6829, comm: lttng-sessiond Tainted: PF O 3.8.0-31-generic #46-Ubuntu /DH77EB [11150.410294] RIP: 0010:[<ffffffff810839a0>] [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40 [11150.410338] RSP: 0018:ffff8803d30dbaa0 EFLAGS: 00010206 [11150.410362] RAX: 0000000000001000 RBX: ffff88023454dc50 RCX: 0000000000000008 [11150.410388] RDX: 0000000000000005 RSI: ffff88023454dc50 RDI: ffff88023454bc58 [11150.410414] RBP: ffff8803d30dbaa0 R08: ffffffff81ce7060 R09: 0000000000000100 [11150.410440] R10: 0000000000000111 R11: 0000000000000000 R12: ffffffffa02171e0 [11150.410466] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000 [11150.410493] FS: 00007fa966ef2700(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000 [11150.410533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11150.410557] CR2: 0000000000001010 CR3: 0000000401e4c000 CR4: 00000000001407e0 [11150.410583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [11150.410609] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [11150.410636] Process lttng-sessiond (pid: 6829, threadinfo ffff8803d30da000, task ffff88040210dd00) [11150.410676] Stack: [11150.410694] ffff8803d30dbab8 ffffffff816ac7b1 ffff88023454dc10 ffff8803d30dbae8 [11150.410741] ffffffffa01e77b4 ffff88023454dc00 ffffffffa02171e0 ffffffffa027969a [11150.410787] ffff88023454dc10 ffff8803d30dbb38 ffffffffa01e7f4e 0000000000020000 [11150.410833] Call Trace: [11150.410857] [<ffffffff816ac7b1>] register_cpu_notifier+0x21/0x30 [11150.410888] [<ffffffffa01e77b4>] channel_backend_init+0x2c4/0x380 [lttng_lib_ring_buffer] [11150.410934] [<ffffffffa01e7f4e>] channel_create+0x7e/0x230 [lttng_lib_ring_buffer] [11150.410978] [<ffffffffa02152a3>] _channel_create+0x33/0x40 [lttng_ring_buffer_client_discard] [11150.411034] [<ffffffffa022c9e4>] lttng_channel_create+0x104/0x1c0 [lttng_tracer] [11150.411093] [<ffffffffa022dceb>] lttng_abi_create_channel+0xab/0x1b0 [lttng_tracer] [11150.411152] [<ffffffffa022e6d7>] lttng_session_ioctl+0x127/0x2c0 [lttng_tracer] [11150.411202] [<ffffffff8118aa83>] ? __mem_cgroup_uncharge_common+0xe3/0x2d0 [11150.411230] [<ffffffff8114cd33>] ? __dec_zone_page_state+0x33/0x40 [11150.411257] [<ffffffff81163462>] ? page_remove_rmap+0xa2/0x180 [11150.411284] [<ffffffff8112f743>] ? unlock_page+0x23/0x30 [11150.411310] [<ffffffff811561d3>] ? do_wp_page+0x393/0x7f0 [11150.411337] [<ffffffff8113bfd0>] ? release_pages+0x1e0/0x220 [11150.411365] [<ffffffff811b37a6>] ? mntput+0x26/0x40 [11150.411390] [<ffffffff811a61a9>] do_vfs_ioctl+0x99/0x570 [11150.411415] [<ffffffff81195eae>] ? ____fput+0xe/0x10 [11150.411441] [<ffffffff8107a2fc>] ? task_work_run+0xac/0xe0 [11150.411466] [<ffffffff811a6711>] sys_ioctl+0x91/0xb0 [11150.411492] [<ffffffff816d13be>] ? do_page_fault+0xe/0x10 [11150.411517] [<ffffffff816d59dd>] system_call_fastpath+0x1a/0x1f [11150.411541] Code: f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 85 c0 74 21 8b 56 10 3b 50 10 7e 0c eb 17 0f 1f 44 00 00 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 [11150.411774] RIP [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40 [11150.411803] RSP <ffff8803d30dbaa0> [11150.411824] CR2: 0000000000001010 [11150.412275] ---[ end trace d23ffee18c624aab ]---
Updated by Mathieu Desnoyers over 10 years ago
This might very likely be a Ubuntu specific kernel issue. Please let us know if you can still reproduce with a newer Ubuntu kernel. We're been unable to reproduce on our side.
Please also try reproducing in a VM environment. I suspect this issue might be caused by a Ubuntu driver module.
Thanks,
Mathieu
Updated by Mathieu Desnoyers over 10 years ago
- Status changed from In Progress to Feedback
Updated by Christian Babeux over 10 years ago
- Status changed from Feedback to Resolved
Marked as resolved since we did not have feedback on this issue.
You can re-open this issue if you are still having this issue.