Project

General

Profile

Actions

Bug #646

closed

crash when trying to take snapshot

Added by Matthew Khouzam over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
09/27/2013
Due date:
% Done:

0%

Estimated time:

Description

kernel: 3.8.0
3.8.0-30-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

The crash occurs when I try to take a kernel snapshot.

lttng 2.3.0 from ubuntu ppas
so.
lttng create --snapshot
lttng enable -a -k
lttng enable -a -u
lttng start
lttng snapshot record
=====CRASH!=====

It happened twice so far, I'm now testing with 3.8.0.31


Files

crash.jpg (3.3 MB) crash.jpg Matthew Khouzam, 09/27/2013 02:24 PM
Actions #1

Updated by Anonymous over 10 years ago

Was it with 3.8.0-30.43 or 3.8.0-30.44? You can check the package version with "apt-cache policy <package_name>".

Actions #2

Updated by Mathieu Desnoyers over 10 years ago

  • Status changed from New to Feedback

Can this be reproduced with a vanilla Linux kernel, or is it specific to one Ubuntu kernel sub-version ?

Actions #3

Updated by Matthew Khouzam over 10 years ago

I can confirm it happens more often now, we just need to snapshot often
proof: http://imgur.com/EG2wPNU

Actions #4

Updated by Matthew Khouzam over 10 years ago

I have not tried with vanilla kernels, I am only using Ubuntu.

right now I am using
uname --all
Linux moya 3.8.0-32-generic #47-Ubuntu SMP Tue Oct 1 22:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Actions #5

Updated by Matthew Khouzam over 10 years ago

Matthew Khouzam wrote:

I can confirm it happens more often now, we just need to snapshot often
proof:

Old http://imgur.com/EG2wPNU
new http://imgur.com/r2oRJuP

Actions #6

Updated by Matthew Khouzam over 10 years ago

To achieve this more reliably I was in eclipse with a remote sshed connection. I had 3 sessions going and was round robin snapshotting them. to do that in eclipse, highlight the 3 sessions and click on the camera button often.

Actions #7

Updated by Mathieu Desnoyers over 10 years ago

  • Status changed from Feedback to In Progress
Actions #8

Updated by Matthew Khouzam over 10 years ago

LTTng kernel modules are now lttng-modules-dkms:amd64 (2.4~pre-0+bzr545+pack17+201310151711~ubuntu13.04.1, 2.4~pre-0+bzr547+pack17+201310281547~ubuntu13.04.1) according to my install log.

Hope this helps

Actions #9

Updated by Anonymous over 10 years ago

As we mentioned offline, you can use this page:
https://code.launchpad.net/~lttng/lttng-modules/trunk
to map which upstream commit "bzr547" represents.

But also, we'd need to debug if those modules are actually loaded, it seems DKMS doesn't always want to overwrite the lttng-modules that come with the Ubuntu kernel.

Actions #10

Updated by Mathieu Desnoyers over 10 years ago

  • Assignee set to Jérémie Galarneau
Actions #11

Updated by Francis Giraldeau over 10 years ago

I confirm this bug with kernel 3.8.0-31-generic, lttng (LTTng Trace Control) 2.3.0 - Dominus Vobiscum and the latest HEAD lttng-modules. The crash occurs in register_cpu_notifier() and also from unregister_cpu_notifier().

[11150.403985] BUG: unable to handle kernel paging request at 0000000000001010
[11150.404041] IP: [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40
[11150.404075] PGD 0 
[11150.404095] Oops: 0000 [#1] SMP 
[11150.404120] Modules linked in: lttng_probe_writeback(OF) lttng_probe_workqueue(OF) lttng_probe_vmscan(OF) lttng_probe_udp(OF) lttng_probe_timer(OF) lttng_probe_sunrpc(OF) lttng_probe_statedump(OF) lttng_probe_sock(OF) lttng_probe_skb(OF) lttng_probe_signal(OF) lttng_probe_scsi(OF) lttng_probe_sched(OF) lttng_probe_rpm(OF) lttng_probe_regulator(OF) lttng_probe_regmap(OF) lttng_probe_rcu(OF) lttng_probe_random(OF) lttng_probe_printk(OF) lttng_probe_power(OF) lttng_probe_net(OF) lttng_probe_napi(OF) lttng_probe_module(OF) lttng_probe_kvm(OF) lttng_probe_kmem(OF) lttng_probe_jbd2(OF) lttng_probe_jbd(OF) lttng_probe_irq(OF) lttng_probe_gpio(OF) lttng_probe_compaction(OF) lttng_probe_block(OF) lttng_probe_asoc(OF) lttng_types(OF) lttng_ring_buffer_metadata_mmap_client(OF) lttng_ring_buffer_client_mmap_overwrite(OF) lttng_ring_buffer_client_mmap_discard(OF) lttng_ring_buffer_metadata_client(OF) lttng_ring_buffer_client_overwrite(OF) lttng_ring_buffer_client_discard(OF) lttng_tracer(OF) lttng_statedump(OF) lttng_ftrace(OF) lttng_kprobes(OF) lttng_lib_ring_buffer(OF) lttng_kretprobes(OF) ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) nf_conntrack(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) nvidia(POF) rfcomm bnep bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep(F) snd_pcm(F) snd_page_alloc(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F) snd_seq_device(F) kvm_intel(F) snd_timer(F) kvm(F) mei snd(F) mac_hid wmi microcode(F) soundcore(F) ppdev(F) lpc_ich parport_pc(F) binfmt_misc(F) w83627ehf hwmon_vid coretemp lp(F) ext2(F) parport(F) btrfs(F) zlib_deflate(F) libcrc32c(F) dm_crypt(F) usb_storage(F) hid_generic video(F) usbhid ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) hid cryptd(F) ahci(F) libahci(F) e1000e(F) [last unloaded: lttng_statedump]
[11150.410227] CPU 2 
[11150.410235] Pid: 6829, comm: lttng-sessiond Tainted: PF          O 3.8.0-31-generic #46-Ubuntu                  /DH77EB
[11150.410294] RIP: 0010:[<ffffffff810839a0>]  [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40
[11150.410338] RSP: 0018:ffff8803d30dbaa0  EFLAGS: 00010206
[11150.410362] RAX: 0000000000001000 RBX: ffff88023454dc50 RCX: 0000000000000008
[11150.410388] RDX: 0000000000000005 RSI: ffff88023454dc50 RDI: ffff88023454bc58
[11150.410414] RBP: ffff8803d30dbaa0 R08: ffffffff81ce7060 R09: 0000000000000100
[11150.410440] R10: 0000000000000111 R11: 0000000000000000 R12: ffffffffa02171e0
[11150.410466] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000
[11150.410493] FS:  00007fa966ef2700(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
[11150.410533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11150.410557] CR2: 0000000000001010 CR3: 0000000401e4c000 CR4: 00000000001407e0
[11150.410583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11150.410609] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11150.410636] Process lttng-sessiond (pid: 6829, threadinfo ffff8803d30da000, task ffff88040210dd00)
[11150.410676] Stack:
[11150.410694]  ffff8803d30dbab8 ffffffff816ac7b1 ffff88023454dc10 ffff8803d30dbae8
[11150.410741]  ffffffffa01e77b4 ffff88023454dc00 ffffffffa02171e0 ffffffffa027969a
[11150.410787]  ffff88023454dc10 ffff8803d30dbb38 ffffffffa01e7f4e 0000000000020000
[11150.410833] Call Trace:
[11150.410857]  [<ffffffff816ac7b1>] register_cpu_notifier+0x21/0x30
[11150.410888]  [<ffffffffa01e77b4>] channel_backend_init+0x2c4/0x380 [lttng_lib_ring_buffer]
[11150.410934]  [<ffffffffa01e7f4e>] channel_create+0x7e/0x230 [lttng_lib_ring_buffer]
[11150.410978]  [<ffffffffa02152a3>] _channel_create+0x33/0x40 [lttng_ring_buffer_client_discard]
[11150.411034]  [<ffffffffa022c9e4>] lttng_channel_create+0x104/0x1c0 [lttng_tracer]
[11150.411093]  [<ffffffffa022dceb>] lttng_abi_create_channel+0xab/0x1b0 [lttng_tracer]
[11150.411152]  [<ffffffffa022e6d7>] lttng_session_ioctl+0x127/0x2c0 [lttng_tracer]
[11150.411202]  [<ffffffff8118aa83>] ? __mem_cgroup_uncharge_common+0xe3/0x2d0
[11150.411230]  [<ffffffff8114cd33>] ? __dec_zone_page_state+0x33/0x40
[11150.411257]  [<ffffffff81163462>] ? page_remove_rmap+0xa2/0x180
[11150.411284]  [<ffffffff8112f743>] ? unlock_page+0x23/0x30
[11150.411310]  [<ffffffff811561d3>] ? do_wp_page+0x393/0x7f0
[11150.411337]  [<ffffffff8113bfd0>] ? release_pages+0x1e0/0x220
[11150.411365]  [<ffffffff811b37a6>] ? mntput+0x26/0x40
[11150.411390]  [<ffffffff811a61a9>] do_vfs_ioctl+0x99/0x570
[11150.411415]  [<ffffffff81195eae>] ? ____fput+0xe/0x10
[11150.411441]  [<ffffffff8107a2fc>] ? task_work_run+0xac/0xe0
[11150.411466]  [<ffffffff811a6711>] sys_ioctl+0x91/0xb0
[11150.411492]  [<ffffffff816d13be>] ? do_page_fault+0xe/0x10
[11150.411517]  [<ffffffff816d59dd>] system_call_fastpath+0x1a/0x1f
[11150.411541] Code: f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 85 c0 74 21 8b 56 10 3b 50 10 7e 0c eb 17 0f 1f 44 00 00 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 
[11150.411774] RIP  [<ffffffff810839a0>] raw_notifier_chain_register+0x20/0x40
[11150.411803]  RSP <ffff8803d30dbaa0>
[11150.411824] CR2: 0000000000001010
[11150.412275] ---[ end trace d23ffee18c624aab ]---
Actions #12

Updated by Mathieu Desnoyers over 10 years ago

This might very likely be a Ubuntu specific kernel issue. Please let us know if you can still reproduce with a newer Ubuntu kernel. We're been unable to reproduce on our side.

Please also try reproducing in a VM environment. I suspect this issue might be caused by a Ubuntu driver module.

Thanks,

Mathieu

Actions #13

Updated by Mathieu Desnoyers over 10 years ago

  • Status changed from In Progress to Feedback
Actions #14

Updated by Christian Babeux about 10 years ago

  • Status changed from Feedback to Resolved

Marked as resolved since we did not have feedback on this issue.
You can re-open this issue if you are still having this issue.

Actions

Also available in: Atom PDF