Project

General

Profile

Bug #230

Bus error in applications when using big subbuf size and number

Added by Raphaël Beamonte over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Critical
Target version:
-
Start date:
04/26/2012
Due date:
% Done:

100%

Estimated time:

Description

When using LTTng UST with big subbuf size (4194304) and number (256), the application is crashing with the following gdb report :

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7ffff61db700 (LWP 17504)]
0x00007ffff79c2acf in channel_create (config=0x7ffff7bd35a0, name=0x7ffff79c91af "relay-discard-mmap", priv_data=0x7ffff61da0b8, priv_data_align=8, priv_data_size=112, priv_data_init=0x7ffff61da3d0, 
    buf_addr=0x0, subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff61da7e0, wait_fd=0x7ffff61da7e8, memory_map_size=0x7ffff61da7f0)
    at ring_buffer_frontend.c:492
492            chan->priv_data_offset = shmobj->allocated_len;

The lttng-sessiond daemon is giving those information when run with -vvv modes :

DEBUG1: Futex n to 1 prepare done [in futex_nto1_prepare() at futex.c:73]
DEBUG1: Dispatching UST registration pid:17500 ppid:17498 uid:1000 gid:1000 sock:19 name:sample (version 2.0) [in thread_dispatch_ust_registration() at main.c:1346]
DEBUG1: Woken up but nothing in the UST command queue [in thread_dispatch_ust_registration() at main.c:1334]
DEBUG3: Created hashtable size 4 at 0x1d4b470 of type 1 [in lttng_ht_new() at hashtable.c:96]
DEBUG1: App registered with pid:17500 ppid:17498 uid:1000 gid:1000 sock:19 name:sample (version 2.0) [in ust_app_register() at ust-app.c:1365]
DEBUG2: UST app PID 17500 is compatible with major version 2 (supporting <= 2) [in ust_app_validate_version() at ust-app.c:2625]
DEBUG2: UST app global update for app sock 19 for session id 1 [in ust_app_global_update() at ust-app.c:2295]
DEBUG2: UST app pid: 17500 session id 1 not found, creating it [in create_ust_app_session() at ust-app.c:883]
DEBUG3: Created hashtable size 4 at 0x1d4c770 of type 0 [in lttng_ht_new() at hashtable.c:96]
DEBUG2: Shadow copy of session handle -1 [in shadow_copy_session() at ust-app.c:794]
DEBUG2: Channel chanpouet not found on shadow session copy, creating it [in shadow_copy_session() at ust-app.c:823]
DEBUG3: Created hashtable size 4 at 0x1d521a0 of type 1 [in lttng_ht_new() at hashtable.c:96]
DEBUG3: Created hashtable size 4 at 0x1d52440 of type 0 [in lttng_ht_new() at hashtable.c:96]
DEBUG3: UST app channel chanpouet allocated [in alloc_ust_app_channel() at ust-app.c:298]
DEBUG2: UST app shadow copy of channel chanpouet started [in shadow_copy_channel() at ust-app.c:734]
DEBUG2: UST event * not found on shadow copy channel [in shadow_copy_channel() at ust-app.c:761]
DEBUG3: Created hashtable size 4 at 0x1d52a70 of type 1 [in lttng_ht_new() at hashtable.c:96]
DEBUG3: UST app event * allocated [in alloc_ust_app_event() at ust-app.c:333]
DEBUG3: UST app shadow copy of channel chanpouet done [in shadow_copy_channel() at ust-app.c:771]
DEBUG2: UST app session created successfully with handle 2 [in create_ust_app_session() at ust-app.c:907]
Error: Creating channel chanpouet for app (pid: 17500, sock: 19) and session handle 2 with ret -22
DEBUG1: Starting tracing for ust app pid 17500 [in ust_app_start_trace() at ust-app.c:1970]
Error: UST app open metadata failed for app pid:17500 with ret -1
DEBUG3: Opening metadata failed. Cleaning up memory [in create_ust_app_metadata() at ust-app.c:1206]
DEBUG1: PID 17500 unregistering with sock 19 [in ust_app_unregister() at ust-app.c:1395]
DEBUG1: Apps thread polling on 2 fds [in thread_manage_apps() at main.c:1200]

The commands used to reproduce the error are the following :

$ sudo lttng create pouet
Session pouet created.
Traces will be written in /root/lttng-traces/pouet-20120426-182501
$ sudo lttng enable-channel chanpouet -u --subbuf-size 4194304 --num-subbuf 256
UST channel chanpouet enabled for session pouet
$ sudo lttng enable-event * -u -c chanpouet -a
All UST events are enabled in channel chanpouet
$ sudo lttng start
Tracing started for session pouet

The application returning the error in the example found here is the easy-ust one, in doc/examples/.

#1

Updated by Mathieu Desnoyers over 8 years ago

  • Status changed from New to Feedback
  • Assignee set to Mathieu Desnoyers

Is this on a 32 or 64-bit user-space ?

#2

Updated by Raphaël Beamonte over 8 years ago

64-bit

#3

Updated by Raphaël Beamonte over 8 years ago

More information:

About sample bin :

$ file sample
sample: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[sha1]=0x8e5e3e3c417639b2c6076a7cd4c5df419649dde9, not stripped

About memory :

$ cat /proc/meminfo
MemTotal:        6111856 kB
MemFree:         1374464 kB
Buffers:          285184 kB
Cached:          3329964 kB
SwapCached:         1104 kB
Active:          2104200 kB
Inactive:        1863988 kB
Active(anon):     260516 kB
Inactive(anon):    93096 kB
Active(file):    1843684 kB
Inactive(file):  1770892 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3938300 kB
SwapFree:        3902840 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        351936 kB
Mapped:            55712 kB
Shmem:               572 kB
Slab:             469540 kB
SReclaimable:     437160 kB
SUnreclaim:        32380 kB
KernelStack:        3056 kB
PageTables:        18184 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6994228 kB
Committed_AS:    1190564 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      156132 kB
VmallocChunk:   34359571960 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      907264 kB
DirectMap2M:     5376000 kB
#4

Updated by Raphaël Beamonte over 8 years ago

After new tests, here is the situation :
- I rebooted the computer where the bug occurs
- I tried to run easy-ust in the same conditions, at first, no bugs occured
- After a short period of time and a test with xhp (which returned the bug), I tried a new time to run easy-ust and I had the error, but not at the same place

- Nothing appears in "dmesg"

uname of the computer:

Linux USBnux 3.0.10-rt27+ #1 SMP PREEMPT RT Wed Nov 23 17:24:50 EST 2011 x86_64 GNU/Linux

Here is the gdb log with full bt :

(gdb) run
Starting program: /home/xaf/git/lttng-ust/doc/examples/easy-ust/sample 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff69d9700 (LWP 9689)]
[New Thread 0x7ffff61d8700 (LWP 9690)]

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7ffff61d8700 (LWP 9690)]
0x00007ffff79c2127 in channel_create (config=0x7ffff7bd3220, name=0x7ffff79c907f "relay-discard-mmap", priv_data=0x7ffff61d70b8, priv_data_align=8, priv_data_size=112, priv_data_init=0x7ffff61d73d0, 
    buf_addr=0x0, subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff61d77e0, wait_fd=0x7ffff61d77e8, memory_map_size=0x7ffff61d77f0)
    at ring_buffer_frontend.c:492
492            chan->priv_data_offset = shmobj->allocated_len;
(gdb) bt full
#0  0x00007ffff79c2127 in channel_create (config=0x7ffff7bd3220, name=0x7ffff79c907f "relay-discard-mmap", priv_data=0x7ffff61d70b8, priv_data_align=8, priv_data_size=112, priv_data_init=0x7ffff61d73d0, 
    buf_addr=0x0, subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff61d77e0, wait_fd=0x7ffff61d77e8, memory_map_size=0x7ffff61d77f0)
    at ring_buffer_frontend.c:492
        priv_data_alloc = {_ref = {index = 140737347420160, offset = 140737351952194}, _type = 0x7ffff7998000}
        ret = 32767
        cpu = 32767
        shmsize = 880
        chansize = 768
        chan = 0x7ffff7ff7000
        handle = 0x602400
        shmobj = 0x602440
        ref = 0x7ffff7620af8
        __PRETTY_FUNCTION__ = "channel_create" 
#1  0x00007ffff79b5250 in _channel_create (name=0x7ffff79c907f "relay-discard-mmap", buf_addr=0x0, subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff61d77e0, 
    wait_fd=0x7ffff61d77e8, memory_map_size=0x7ffff61d77f0, chan_priv_init=0x7ffff61d73d0) at ltt-ring-buffer-client.h:424
        priv = 0x7ffff79c907f
        ltt_chan = 0x0
        handle = 0x7ffff79b0196
#2  0x00007ffff79ad7c8 in ltt_channel_create (session=0x6022d0, transport_name=0x7ffff79c907f "relay-discard-mmap", buf_addr=0x0, subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, 
    read_timer_interval=200, shm_fd=0x7ffff61d77e0, wait_fd=0x7ffff61d77e8, memory_map_size=0x7ffff61d77f0, chan_priv_init=0x7ffff61d73d0) at ltt-events.c:470
        chan = 0x0
        transport = 0x7ffff7bd82a0
        __func__ = "ltt_channel_create" 
#3  0x00007ffff79a854c in lttng_abi_create_channel (session_objd=1, chan_param=0x7ffff61d7998, channel_type=PER_CPU_CHANNEL, uargs=0x7ffff61d77e0) at lttng-ust-abi.c:415
        session = 0x6022d0
        ops = 0x7ffff7bd3080
        transport_name = 0x7ffff79c907f "relay-discard-mmap" 
        chan = 0x1
        chan_objd = 2
        ret = 0
        chan_priv_init = {chan = 0x0, enabled = 0, ctx = 0x0, session = 0x6022d0, objd = 0, free_event_id = 0, used_event_id = 0, list = {next = 0x0, prev = 0x0}, ops = 0x0, header_type = 0, handle = 0x0, 
          metadata_dumped = 0, id = 0, uuid = "\000fM\023J\276C>\265\274", <incomplete sequence \375\201\243\262\230>}
        __PRETTY_FUNCTION__ = "lttng_abi_create_channel" 
#4  0x00007ffff79a8658 in lttng_session_cmd (objd=1, cmd=81, arg=140737322514840, uargs=0x7ffff61d77e0) at lttng-ust-abi.c:478
        session = 0x6022d0
#5  0x00007ffff79a49be in handle_message (sock_info=0x7ffff7bd3f00, sock=8, lum=0x7ffff61d7990) at lttng-ust-comm.c:270
        ret = 0
        ops = 0x7ffff7bd3070
        lur = {handle = 0, cmd = 0, ret_code = 0, ret_val = 0, u = {channel = {memory_map_size = 0}, stream = {memory_map_size = 0}, version = {major = 0, minor = 0, patchlevel = 0}, tracepoint = {
              name = '\000' <repeats 255 times>, loglevel = 0, padding = '\000' <repeats 15 times>}}}
        shm_fd = 32767
        wait_fd = -134332416
        args = {channel = {shm_fd = 0x7ffff61d7870, wait_fd = 0x7ffff61d7888, memory_map_size = 0x7ffff7620b40}, stream = {shm_fd = 0x7ffff61d7870, wait_fd = 0x7ffff61d7888, memory_map_size = 0x7ffff7620b40}}
        __func__ = "handle_message" 
#6  0x00007ffff79a6a7c in ust_listener_thread (arg=0x7ffff7bd3f00) at lttng-ust-comm.c:762
        len = 584
        lum = {handle = 1, cmd = 81, u = {channel = {overwrite = 0, subbuf_size = 4194304, num_subbuf = 256, switch_timer_interval = 0, read_timer_interval = 200, output = LTTNG_UST_MMAP, 
              padding = '\000' <repeats 287 times>}, stream = {padding = "\000\000\000\000\000\000\000\000\000\000@\000\000\000\000", u = {
                padding = "\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 274 times>}}, event = {instrumentation = LTTNG_UST_TRACEPOINT, 
              name = "\000\000\000\000\000\000@\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 230 times>, loglevel_type = LTTNG_UST_LOGLEVEL_ALL, loglevel = 0, 
              padding = '\000' <repeats 15 times>, u = {padding = '\000' <repeats 287 times>}}, context = {ctx = LTTNG_UST_CONTEXT_VTID, padding = "\000\000\000\000\000\000@\000\000\000\000\000\000\001\000", 
              u = {padding = "\000\000\000\000\000\000\000\000\310", '\000' <repeats 278 times>}}, version = {major = 0, minor = 0, patchlevel = 4194304}, tracepoint = {
              name = "\000\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 226 times>, loglevel = 0, 
              padding = '\000' <repeats 15 times>}}}
        sock_info = 0x7ffff7bd3f00
        sock = 8
        ret = 0
        prev_connect_failed = 0
---Type <return> to continue, or q <return> to quit---
        has_waited = 0
        __func__ = "ust_listener_thread" 
        __PRETTY_FUNCTION__ = "ust_listener_thread" 
#7  0x00007ffff6be2b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007ffff76e990d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x0000000000000000 in ?? ()
No symbol table info available.
#5

Updated by Raphaël Beamonte over 8 years ago

New try :
- The previous tries was made with CPU isolation through cgroups, this one is not
- After boot, I directly tried xhp : bus error with shm_index line pointed out by gdb
- I tried sample just after and got a bus error, but with a third file (previously it was ring_buffer_backend.c and ring_buffer_frontend.c, now it's lttng-ust-comm.c)

- Nothing in the "dmesg"

Here is the gdb log with full bt of the run of xhp :

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7ffff5916700 (LWP 2782)]
0x00007ffff7737c48 in lib_ring_buffer_backend_allocate (config=0x7ffff7ff6070, bufb=0x7fff73204030, 
    size=1073741824, num_subbuf=256, extra_reader_sb=0, handle=0x6046c0, shmobj=0x6052c0)
    at ring_buffer_backend.c:94
94            shmp_index(handle, bufb->buf_wsb, i)->id = subbuffer_id(config, 0, 1, i);
(gdb) bt full
#0  0x00007ffff7737c48 in lib_ring_buffer_backend_allocate (config=0x7ffff7ff6070, 
    bufb=0x7fff73204030, size=1073741824, num_subbuf=256, extra_reader_sb=0, handle=0x6046c0, 
    shmobj=0x6052c0) at ring_buffer_backend.c:94
        chanb = 0x7ffff7ff6030
        subbuf_size = 4194304
        mmap_offset = 0
        num_subbuf_alloc = 256
        i = 0
#1  0x00007ffff7737ec8 in lib_ring_buffer_backend_create (bufb=0x7fff73204030, chanb=0x7ffff7ff6030, 
    cpu=1, handle=0x6046c0, shmobj=0x6052c0) at ring_buffer_backend.c:139
        config = 0x7ffff7ff6070
#2  0x00007ffff773b7d6 in lib_ring_buffer_create (buf=0x7fff73204000, chanb=0x7ffff7ff6030, cpu=1, 
    handle=0x6046c0, shmobj=0x6052c0) at ring_buffer_frontend.c:197
        config = 0x7ffff7ff6070
        chan = 0x7ffff7ff6000
        priv = 0x7ffff7ff6300
        subbuf_header_size = 56
        tsc = 142562578180
        ret = 0
#3  0x00007ffff7738707 in channel_backend_init (chanb=0x7ffff7ff6030, 
    name=0x7ffff774307f "relay-discard-mmap", config=0x7ffff794d220, subbuf_size=4194304, 
    num_subbuf=256, handle=0x6046c0) at ring_buffer_backend.c:292
        shmobj = 0x6052c0
        buf = 0x7fff73204000
        chan = 0x7ffff7ff6000
        i = 1
        ret = 0
        shmsize = 1078028416
        num_subbuf_alloc = 257
#4  0x00007ffff773c23e in channel_create (config=0x7ffff794d220, 
    name=0x7ffff774307f "relay-discard-mmap", priv_data=0x7ffff59150b8, priv_data_align=8, 
    priv_data_size=112, priv_data_init=0x7ffff59153d0, buf_addr=0x0, subbuf_size=4194304, 
    num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff59157e0, 
    wait_fd=0x7ffff59157e8, memory_map_size=0x7ffff59157f0) at ring_buffer_frontend.c:503
        ret = 32767
        cpu = 32767
        shmsize = 880
        chansize = 768
        chan = 0x7ffff7ff6000
        handle = 0x6046c0
        shmobj = 0x605250
        ref = 0x0
        __PRETTY_FUNCTION__ = "channel_create" 
#5  0x00007ffff772f250 in _channel_create (name=0x7ffff774307f "relay-discard-mmap", buf_addr=0x0, 
    subbuf_size=4194304, num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, 
    shm_fd=0x7ffff59157e0, wait_fd=0x7ffff59157e8, memory_map_size=0x7ffff59157f0, 
    chan_priv_init=0x7ffff59153d0) at ltt-ring-buffer-client.h:424
        priv = 0x7ffff7ff6300
        ltt_chan = 0x0
        handle = 0x7ffff772a196
#6  0x00007ffff77277c8 in ltt_channel_create (session=0x604590, 
    transport_name=0x7ffff774307f "relay-discard-mmap", buf_addr=0x0, subbuf_size=4194304, 
    num_subbuf=256, switch_timer_interval=0, read_timer_interval=200, shm_fd=0x7ffff59157e0, 
    wait_fd=0x7ffff59157e8, memory_map_size=0x7ffff59157f0, chan_priv_init=0x7ffff59153d0)
    at ltt-events.c:470
        chan = 0x0
        transport = 0x7ffff79522a0
        __func__ = "ltt_channel_create" 
#7  0x00007ffff772254c in lttng_abi_create_channel (session_objd=1, chan_param=0x7ffff5915998, 
    channel_type=PER_CPU_CHANNEL, uargs=0x7ffff59157e0) at lttng-ust-abi.c:415
---Type <return> to continue, or q <return> to quit---
        session = 0x604590
        ops = 0x7ffff794d080
        transport_name = 0x7ffff774307f "relay-discard-mmap" 
        chan = 0x1
        chan_objd = 2
        ret = 0
        chan_priv_init = {chan = 0x0, enabled = 0, ctx = 0x0, session = 0x604590, objd = 0, 
          free_event_id = 0, used_event_id = 0, list = {next = 0x0, prev = 0x0}, ops = 0x0, 
          header_type = 0, handle = 0x0, metadata_dumped = 0, id = 0, 
          uuid = "\252n\365\201\373\033@\n\235#\213\367\026\327\306\027"}
        __PRETTY_FUNCTION__ = "lttng_abi_create_channel" 
#8  0x00007ffff7722658 in lttng_session_cmd (objd=1, cmd=81, arg=140737313331608, 
    uargs=0x7ffff59157e0) at lttng-ust-abi.c:478
        session = 0x604590
#9  0x00007ffff771e9be in handle_message (sock_info=0x7ffff794df00, sock=8, lum=0x7ffff5915990)
    at lttng-ust-comm.c:270
        ret = 0
        ops = 0x7ffff794d070
        lur = {handle = 0, cmd = 0, ret_code = 0, ret_val = 0, u = {channel = {memory_map_size = 0}, 
            stream = {memory_map_size = 0}, version = {major = 0, minor = 0, patchlevel = 0}, 
            tracepoint = {name = '\000' <repeats 255 times>, loglevel = 0, 
              padding = '\000' <repeats 15 times>}}}
        shm_fd = 32767
        wait_fd = -134336512
        args = {channel = {shm_fd = 0x7ffff5915870, wait_fd = 0x7ffff5915888, 
            memory_map_size = 0x7ffff6f8fb40}, stream = {shm_fd = 0x7ffff5915870, 
            wait_fd = 0x7ffff5915888, memory_map_size = 0x7ffff6f8fb40}}
        __func__ = "handle_message" 
#10 0x00007ffff7720a7c in ust_listener_thread (arg=0x7ffff794df00) at lttng-ust-comm.c:762
        len = 584
        lum = {handle = 1, cmd = 81, u = {channel = {overwrite = 0, subbuf_size = 4194304, 
              num_subbuf = 256, switch_timer_interval = 0, read_timer_interval = 200, 
              output = LTTNG_UST_MMAP, padding = '\000' <repeats 287 times>}, stream = {
              padding = "\000\000\000\000\000\000\000\000\000\000@\000\000\000\000", u = {
                padding = "\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 274 times>}}, event = {instrumentation = LTTNG_UST_TRACEPOINT, 
              name = "\000\000\000\000\000\000@\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 230 times>, loglevel_type = LTTNG_UST_LOGLEVEL_ALL, loglevel = 0, 
              padding = '\000' <repeats 15 times>, u = {padding = '\000' <repeats 287 times>}}, 
            context = {ctx = LTTNG_UST_CONTEXT_VTID, 
              padding = "\000\000\000\000\000\000@\000\000\000\000\000\000\001\000", u = {
                padding = "\000\000\000\000\000\000\000\000\310", '\000' <repeats 278 times>}}, 
            version = {major = 0, minor = 0, patchlevel = 4194304}, tracepoint = {
              name = "\000\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\310", '\000' <repeats 226 times>, loglevel = 0, 
              padding = '\000' <repeats 15 times>}}}
        sock_info = 0x7ffff794df00
        sock = 8
        ret = 0
        prev_connect_failed = 0
        has_waited = 0
        __func__ = "ust_listener_thread" 
        __PRETTY_FUNCTION__ = "ust_listener_thread" 
#11 0x00007ffff6d6ab50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#12 0x00007ffff705890d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)

And here is the gdb log with full bt of the run of sample :

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7ffff69d9700 (LWP 3004)]
0x00007ffff79a6127 in wait_for_sessiond (sock_info=0x7ffff7bd5f40) at lttng-ust-comm.c:618
618        if (uatomic_read((int32_t *) sock_info->wait_shm_mmap) == 0) {
(gdb) bt full
#0  0x00007ffff79a6127 in wait_for_sessiond (sock_info=0x7ffff7bd5f40) at lttng-ust-comm.c:618
        ret = -134239824
        __func__ = "wait_for_sessiond" 
#1  0x00007ffff79a63ef in ust_listener_thread (arg=0x7ffff7bd5f40) at lttng-ust-comm.c:663
        sock_info = 0x7ffff7bd5f40
        sock = -157442632
        ret = 0
        prev_connect_failed = 1
        has_waited = 0
        __func__ = "ust_listener_thread" 
        __PRETTY_FUNCTION__ = "ust_listener_thread" 
#2  0x00007ffff6be2b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x00007ffff76e990d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)
#6

Updated by Raphaël Beamonte over 8 years ago

The bug has been reproduced with xhp on my desktop (the previous tests was on my test computer).

uname of desktop:

Linux station9 3.2.0-2-amd64 #1 SMP Tue Mar 20 18:36:37 UTC 2012 x86_64 GNU/Linux

df -h on desktop:

Sys. fich.                       Taille Util. Dispo Uti% Monté sur
rootfs                             134G  114G   14G  90% /
udev                               4,0G     0  4,0G   0% /dev
tmpfs                              801M  756K  800M   1% /run
/dev/mapper/sda3_crypt             134G  114G   14G  90% /
tmpfs                              5,0M     0  5,0M   0% /run/lock
tmpfs                              1,6G   12M  1,6G   1% /tmp
tmpfs                              1,6G  7,4M  1,6G   1% /run/shm
/dev/sda1                          9,3G  393M  8,5G   5% /boot

ls -al /run/shm on desktop:

total 6744
drwxrwxrwt  2 root       root            960 avr 27 17:44 .
drwxr-xr-x 25 root       root           1060 avr 27 17:29 ..
-rw-rw-r--  1 root       root           4096 avr 27 17:29 lttng-ust-apps-wait
-rw-r-----  1 root       root           4096 avr 27 17:18 lttng-ust-apps-wait-0
-rw-r-----  1 xaf        xaf            4096 avr 27 17:19 lttng-ust-apps-wait-1000
-r--------  1 xaf        xaf        67108904 avr 27 17:24 pulse-shm-1029511361
-r--------  1 xaf        xaf        67108904 avr 17 17:14 pulse-shm-104721547
-r--------  1 xaf        xaf        67108904 avr 26 16:17 pulse-shm-1099476367
-r--------  1 xaf        xaf        67108904 avr 27 15:48 pulse-shm-1124991714
-r--------  1 xaf        xaf        67108904 avr 26 15:50 pulse-shm-1170178806
-r--------  1 xaf        xaf        67108904 avr 26 16:57 pulse-shm-1379906063
-r--------  1 xaf        xaf        67108904 avr 27 17:15 pulse-shm-1746028421
-r--------  1 xaf        xaf        67108904 avr 25 13:04 pulse-shm-178697491
-r--------  1 xaf        xaf        67108904 avr  2 15:47 pulse-shm-1816843589
-r--------  1 xaf        xaf        67108904 avr 26 15:29 pulse-shm-1846849105
-r--------  1 xaf        xaf        67108904 avr 27 14:57 pulse-shm-1880463890
-r--------  1 xaf        xaf        67108904 avr 27 03:15 pulse-shm-1959361426
-r--------  1 xaf        xaf        67108904 avr 27 17:15 pulse-shm-1965411736
-r--------  1 xaf        xaf        67108904 avr 27 16:42 pulse-shm-1990550816
-r--------  1 xaf        xaf        67108904 avr 26 15:28 pulse-shm-2145683102
-r--------  1 xaf        xaf        67108904 avr 26 15:44 pulse-shm-2160408385
-r--------  1 xaf        xaf        67108904 avr 26 17:21 pulse-shm-2167738855
-r--------  1 xaf        xaf        67108904 avr 27 12:46 pulse-shm-2252384463
-r--------  1 xaf        xaf        67108904 avr  2 15:47 pulse-shm-2277612673
-r--------  1 xaf        xaf        67108904 avr 27 17:25 pulse-shm-2377803952
-r--------  1 xaf        xaf        67108904 avr 27 17:14 pulse-shm-239422023
-r--------  1 xaf        xaf        67108904 avr 26 15:28 pulse-shm-2515604763
-r--------  1 xaf        xaf        67108904 avr 27 17:19 pulse-shm-2515863523
-r--------  1 xaf        xaf        67108904 avr 27 17:24 pulse-shm-2687077960
-r--------  1 xaf        xaf        67108904 avr 27 17:14 pulse-shm-2773630829
-r--------  1 xaf        xaf        67108904 avr 26 17:05 pulse-shm-291251503
-r--------  1 xaf        xaf        67108904 avr 26 15:47 pulse-shm-293328172
-r--------  1 xaf        xaf        67108904 avr 26 17:34 pulse-shm-2974796099
-r--------  1 xaf        xaf        67108904 avr 27 17:17 pulse-shm-3009096199
-r--------  1 xaf        xaf        67108904 avr 27 17:25 pulse-shm-3075027986
-r--------  1 Debian-gdm Debian-gdm 67108904 avr  2 15:47 pulse-shm-3167935449
-r--------  1 xaf        xaf        67108904 avr 27 15:08 pulse-shm-3247770080
-r--------  1 xaf        xaf        67108904 avr 27 17:14 pulse-shm-326463393
-r--------  1 xaf        xaf        67108904 avr 27 17:24 pulse-shm-3309688731
-r--------  1 xaf        xaf        67108904 avr 27 17:21 pulse-shm-3396719529
-r--------  1 xaf        xaf        67108904 avr  2 15:47 pulse-shm-3649908245
-r--------  1 xaf        xaf        67108904 avr 26 15:48 pulse-shm-3840420906
-r--------  1 xaf        xaf        67108904 avr 26 17:02 pulse-shm-4141615345
-r--------  1 xaf        xaf        67108904 avr 27 15:50 pulse-shm-4261104286
-r--------  1 xaf        xaf        67108904 avr 27 15:50 pulse-shm-930790983
-rw-rw-rw-  1 xaf        xaf              16 avr 27 14:26 sem.ADBE_ReadPrefs_xaf
-rw-rw-rw-  1 xaf        xaf              16 avr 27 14:26 sem.ADBE_REL_xaf
-rw-rw-rw-  1 xaf        xaf              16 avr 27 14:26 sem.ADBE_WritePrefs_xaf

df -h on test:

Sys. fich.                                             Taille Util. Dispo Uti% Monté sur
rootfs                                                    54G   27G   24G  53% /
udev                                                     2,9G     0  2,9G   0% /dev
tmpfs                                                    597M  708K  597M   1% /run
/dev/disk/by-uuid/f611f368-60ec-49d2-8d27-44c25298a1bf    54G   27G   24G  53% /
tmpfs                                                    5,0M     0  5,0M   0% /run/lock
tmpfs                                                    1,2G   16K  1,2G   1% /tmp
tmpfs                                                    1,2G  1,2G     0 100% /run/shm

ls -al /run/shm on test:

total 260
drwxrwxrwt  2 root       root            160 avril 27 17:32 .
drwxr-xr-x 21 root       root            940 avril 27 16:55 ..
-rw-rw-r--  1 root       root           4096 avril 27 17:32 lttng-ust-apps-wait
-rw-r-----  1 root       root           4096 avril 27 16:54 lttng-ust-apps-wait-0
-rw-r-----  1 xaf        xaf            4096 avril 27 16:55 lttng-ust-apps-wait-1000
-r--------  1 Debian-gdm Debian-gdm 67108904 avril 27 16:52 pulse-shm-1157656584
-r--------  1 Debian-gdm Debian-gdm 67108904 avril 27 16:52 pulse-shm-2086060342
-r--------  1 Debian-gdm Debian-gdm 67108904 avril 27 16:52 pulse-shm-2217588501

#7

Updated by Raphaël Beamonte over 8 years ago

desktop /proc/cpuinfo:


processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 0
siblings    : 4
core id        : 0
cpu cores    : 4
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.12
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 1
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 1
siblings    : 4
core id        : 0
cpu cores    : 4
apicid        : 4
initial apicid    : 4
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.18
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 2
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 0
siblings    : 4
core id        : 1
cpu cores    : 4
apicid        : 1
initial apicid    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.16
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 3
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 1
siblings    : 4
core id        : 1
cpu cores    : 4
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.17
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 4
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 4
apicid        : 2
initial apicid    : 2
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.17
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 5
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 1
siblings    : 4
core id        : 2
cpu cores    : 4
apicid        : 6
initial apicid    : 6
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.17
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 6
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 0
siblings    : 4
core id        : 3
cpu cores    : 4
apicid        : 3
initial apicid    : 3
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.15
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

processor    : 7
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
microcode    : 0xa07
cpu MHz        : 2000.062
cache size    : 6144 KB
physical id    : 1
siblings    : 4
core id        : 3
cpu cores    : 4
apicid        : 7
initial apicid    : 7
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips    : 4000.18
clflush size    : 64
cache_alignment    : 64
address sizes    : 38 bits physical, 48 bits virtual
power management:

desktop /proc/meminfo:

MemTotal:        8199244 kB
MemFree:         1823544 kB
Buffers:          218120 kB
Cached:          2045400 kB
SwapCached:       244672 kB
Active:          3856716 kB
Inactive:        1878488 kB
Active(anon):    2705576 kB
Inactive(anon):   826740 kB
Active(file):    1151140 kB
Inactive(file):  1051748 kB
Unevictable:          16 kB
Mlocked:              16 kB
SwapTotal:       6368248 kB
SwapFree:        5748948 kB
Dirty:                44 kB
Writeback:             0 kB
AnonPages:       3304896 kB
Mapped:           205080 kB
Shmem:             60632 kB
Slab:             471948 kB
SReclaimable:     424264 kB
SUnreclaim:        47684 kB
KernelStack:        5400 kB
PageTables:        61404 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10467868 kB
Committed_AS:    8031208 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       76952 kB
VmallocChunk:   34359649248 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     4857216 kB
DirectMap2M:     3530752 kB

#8

Updated by Mathieu Desnoyers over 8 years ago

I suspect we'll have to cap the max size of UST buffers we can allocate from a sessiond point of view. I suspect that the cause of your SIGBUS is that you create SHM that are larger than /dev/shm (or /run/shm) on your system. Can you double-check this by playing with various subbuffer size, and by modifying your system's /dev/shm or /run/shm tmpfs size ?

Thanks,

Mathieu

#9

Updated by Raphaël Beamonte over 8 years ago

On test, the following configurations are working : (subbuf_num * subbuf_size)
- 32 * 4194304
- 64 * 2097152
- 128 * 1048576
- 256 * 524288
- 512 * 262144

The max allowed size seems to be 134 217 728.

After doing the same tests on desktop, the same result appears, with exactly the same sizes.
If we multiplicate by 2 the number of subbuffers or the size of the subbuffers that are listed above, the bus error appears on the two computers.

#10

Updated by Mathieu Desnoyers over 8 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF