Project

General

Profile

Actions

Bug #1392

open

lttng view: Stream 0 is not declared in metadata

Added by Ricardo Nabinger Sanchez about 1 year ago. Updated 9 months ago.

Status:
Feedback
Priority:
Normal
Target version:
-
Start date:
09/27/2023
Due date:
% Done:

0%

Estimated time:

Description

While trying to capture UST events from a simple program, even though kernel events seem to be properly captured, UST events are not. Trace Compass cannot decode the UST data (but it can decode from the kernel channel).
The application is single-threaded, performs only disk and stdout I/O, nothing fancy. Compiled with -finstrument-functions, no optimizations, and with debugging symbols.

Session setup:

# lttng create
# lttng enable-channel -u --subbuf-size=64M --num-subbuf=8 chan_ust
# lttng enable-channel -k --subbuf-size=64M --num-subbuf=2 chan_kernel
# lttng add-context -u -t vpid -t ip -t procname -t vtid -c chan_ust
# lttng enable-event -u -a -c chan_ust
# lttng enable-event -k -c chan_kernel --syscall --all
# lttng enable-event -k -c chan_kernel lttng_statedump_block_device,lttng_statedump_file_descriptor,lttng_statedump_process_state,mm_page_alloc,mm_page_free,net_dev_xmit,netif_receive_skb,sched_pi_setprio,sched_process_exec,sched_process_fork,sched_switch,sched_wakeup,sched_waking,softirq_entry,softirq_exit,softirq_raise

# lttng start

  (possibly excessive preloads, but these are from my notes, collecting data on Varnish Cache)
# LTTNG_UST_BLOCKING_RETRY_TIMEOUT=-1 LD_PRELOAD=liblttng-ust-fd.so:liblttng-ust-fork.so:liblttng-ust-dl.so:liblttng-ust-cyg-profile.so:liblttng-ust-libc-wrapper.so:liblttng-ust-pthread-wrapper.so tests/test-playlist tests/playlist/bug_ffmpeg_1000-x20.m3u

# lttng stop

# lttng view > /dev/null
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.
[error] Stream 0 is not declared in metadata.

Environment:
  • lttng-modules-2.13.10 (from tarball)
  • lttng-tools-2.13.11 (from tarball)
  • lttng-ust-2.13.6 (from tarball)
  • babeltrace (git; at v1.5.11)
  • Linux kernel 6.6.0-rc1
  • Slackware64-current, up-to-date as of 2023-09-26
  • Trace Compass 9.1.0.202309200833

IRC snippet:

Sep/27 16:13:54 rnsanchez I'm trying to find out what I'm doing wrong..  I keep getting "[error] Stream 0 is not declared in metadata." in my captures.  I guess this is making tracecompass not being able to show UST events (simple function enter-exit instrumentation)
Sep/27 16:16:46 rnsanchez using my old notes, while checking https://lttng.org/docs/v2.13/#doc-liblttng-ust-cyg-profile and other sections
Sep/27 16:18:25 jgalar rnsanchez: shot in the dark, but are you making sure the session is stopped or destroyed before reading the trace?
Sep/27 16:18:28 rnsanchez if it helps knowing, events from kernel channel are fine and I can see them in tracecompass (the trace is huge)
Sep/27 16:18:52 rnsanchez jgalar: yes, with lttng stop, then I copy the data from root to my user
Sep/27 16:19:10 rnsanchez but I get that Stream 0 regardless.. it's from lttng view
Sep/27 16:20:11 jgalar it sounds like a metadata flushing issue, any chance you can share the trace? (or at least, the metadata files)
Sep/27 16:20:42 rnsanchez yes, sure.  just a moment while I upload it
Sep/27 16:20:48 jgalar okay thanks
Sep/27 16:21:43 ebugden not sure if related, but it's triggering deja vu of this issue: https://github.com/efficios/barectf/commit/d024537859c1d869bfa1cedc8abe8e3f7a648faa
Sep/27 16:22:23 ebugden maybe a similar problem in a similar area in lttng view
Sep/27 16:25:28 rnsanchez jgalar: https://rnsanchez.wait4.org/auto-20230927-161138.tar.xz
Sep/27 16:27:33 rnsanchez jgalar: this last one has incomplete events as I was trying to investigate, but I can bundle another one with full events as I used to collect
Sep/27 16:31:51 rnsanchez ebugden: is there anything on my side I can do?
Sep/27 16:34:38 ebugden rnsanchez: i don't feel i'm familiar enough with this situation to say; i was tossing out the association as possible food for thought for jgalar
Sep/27 16:35:12 rnsanchez oh ok  :-)
Sep/27 16:43:03 rnsanchez jgalar: just in case, here is with full events: https://rnsanchez.wait4.org/auto-20230927-164014.tar.xz
Sep/27 16:45:53 jgalar ebugden: yup, the errors are indeed similar
Sep/27 16:50:59 jgalar hmm, the user space trace's metadata doesn't declare any stream class, that's... unexpected...
Sep/27 16:51:14 jgalar which versions of lttng-ust and lttng-tools are you running?
Sep/27 16:55:03 rnsanchez ust 2.13.6 (tarball) and tools 2.13.11 (tarball too)
Sep/27 16:55:07 rnsanchez compiled today
Sep/27 16:55:13 jgalar okay
Sep/27 17:17:57 ebugden rnsanchez (cc: jgalar): would you open a bug report on https://bugs.lttng.org/ ?
Sep/27 17:18:16 rnsanchez ebugden: sure
Sep/27 17:19:21 ebugden thanks! (we have a few hypotheses, but we'd need more information about exactly how the trace is generated)
Sep/27 17:19:58 jgalar rnsanchez: it's weird, the user space trace's stream files just have the packet headers... if you can reproduce the problem while running lttng-sessiond with the `-vvv --verbose-consumer` options to get the logs, it would be helpful
Sep/27 17:20:47 rnsanchez I can try.  I suppose I have to finish the sessiond already running and manually launch another?
Sep/27 17:21:06 jgalar yep


Files

sessiond.log (1.13 MB) sessiond.log lttng-sessiond -vvv --verbose-consumer &> sessiond.lo Ricardo Nabinger Sanchez, 09/27/2023 04:33 PM
auto-20230927-164014.tar.xz (11 MB) auto-20230927-164014.tar.xz Kienan Stewart, 02/28/2024 07:26 AM
stream0_not_declared.tgz (540 KB) stream0_not_declared.tgz Arnaud Fiorini, 03/04/2024 02:52 PM
1392-reproducer.sh (942 Bytes) 1392-reproducer.sh Kienan Stewart, 03/11/2024 03:58 PM
err.txt (111 KB) err.txt Arnaud Fiorini, 03/22/2024 12:25 PM
err3.txt (145 KB) err3.txt Arnaud Fiorini, 03/26/2024 10:47 AM
err2.txt (130 KB) err2.txt Arnaud Fiorini, 03/26/2024 10:47 AM
err5.txt (172 KB) err5.txt Arnaud Fiorini, 03/27/2024 11:40 AM
Actions #1

Updated by Arnaud Fiorini 10 months ago

I am running in to the same issue. It appears to happen more often with bigger subbuffer sizes.
Some additional information:
Lttng 2.13.10
liblttng-dev 2.13.7-1~ubuntu20.04.1
Ubuntu 20.04.6 LTS

My /home is stored on an NFS but I am storing the trace in a local folder.

Actions #2

Updated by Jérémie Galarneau 10 months ago

  • Status changed from New to Feedback

Hi Arnaud,

  1. Can you share the trace?
  2. Does the trace become readable if you wait a while before using lttng view?

Thanks!
Jérémie

Actions #4

Updated by Arnaud Fiorini 9 months ago

Hi Jérémie,

1. Here is one example generated with 64 MiB subbuffers.
2. I did a sleep for 10 seconds between lttng stop and lttng view . I still get the error.

Thank you !
Arnaud

Actions #5

Updated by Jérémie Galarneau 9 months ago · Edited

  • Assignee set to Jérémie Galarneau

Hi Arnaud,

Some quick questions:
  1. Can you join the output of the df -H comment? I'd like to see if you are running out of space on /dev/shm
  2. How many CPUs are there on your machine? (What are the contents of /sys/devices/system/cpu/possible?)
  3. Can you give me an idea of what this application does?
    1. Does it run for a while?
    2. Does it get killed or does it exit normally?
    3. Is it crashing?
  4. Do you have a reproducer that you can share?
  5. Is this running in a containerized environment by any chance? (is /proc visible to the application and the LTTng daemons)

Thanks!
Jérémie

Actions #6

Updated by Arnaud Fiorini 9 months ago

1.

Filesystem                 Size  Used Avail Use% Mounted on
udev                       271G     0  271G   0% /dev
tmpfs                       55G   25M   55G   1% /run
/dev/nvme3n1p2             944G   50G  846G   6% /
tmpfs                      271G   25k  271G   1% /dev/shm
tmpfs                      5.3M     0  5.3M   0% /run/lock
tmpfs                      271G     0  271G   0% /sys/fs/cgroup
/dev/loop0                  68M   68M     0 100% /snap/core20/2105
/dev/nvme3n1p1             536M  6.4M  530M   2% /boot/efi
/dev/loop1                  68M   68M     0 100% /snap/core20/2182
/dev/md0                   2.9T  210G  2.7T   8% /scratch
/dev/loop2                  97M   97M     0 100% /snap/lxd/24061
/dev/loop3                  43M   43M     0 100% /snap/snapd/20290
/dev/loop4                  97M   97M     0 100% /snap/lxd/23991
/dev/loop5                  43M   43M     0 100% /snap/snapd/20671
192.168.63.160:/mnt/disk1  2.0T  148G  1.9T   8% /home
tmpfs                       55G     0   55G   0% /run/user/1000

2. 48 cores / 96 threads (0-95)
3. This application runs a couple of GPU kernel function a number of times (20000) and returns.
3.1. runtime is 80ms and .5s for the whole application.
3.2. It exits normally.
3.3. No.
4. It is running in a ROCm RC environment but I can provide the program that I am using.
5. No.

Thanks,
Arnaud

Actions #7

Updated by Kienan Stewart 9 months ago

Hi Arnaud, Ricardo,

do you see the issue when repeating the test procedure using different userspace applications, eg. https://github.com/lttng/lttng-ust/tree/master/doc/examples/hello-static-lib ?

thanks,
kienan

Actions #8

Updated by Arnaud Fiorini 9 months ago

Hi Kienan,

I have the same issue running the hello example.

Thank you,
Arnaud

Actions #9

Updated by Kienan Stewart 9 months ago

Hi Arnaud,

thanks for the confirmation that you experience the issue using the hello-static-lib application.

on a fresh focal VM with the PPA debs (lttng-tools 2.13.10-1~ubuntu20.04.1, lttng-modules-dkms 2.13.11-1~ubuntu20.04.1, lttng-ust 2.13.7-1~ubuntu20.04.1) I'm still not able to reproduce the issue you're seeing using the attached script.

I've also tried using source built versions in the 2.13 branch following the versions Ricardo used.

Would you be comfortable sharing the output of `dpkg -l` and the relevant environment eg. PATH, LD_LIBRARY_PATH, MODPROBE_OPTIONS.

thanks,
kienan

Actions #10

Updated by Arnaud Fiorini 9 months ago

Hi Kienan,

I tested your script on a freshly installed server with the PPA debs. I was able to reproduce the issue. It was happening only on the ust side, the kernel trace was fine.

Here is the environment information:

$ dpkg -l | grep lttng
ii  liblttng-ctl0:amd64                         2.13.10-1~ubuntu22.04.2                 amd64        LTTng control and utility library
ii  liblttng-ust-common1:amd64                  2.13.7-1~ubuntu22.04.1                  amd64        LTTng 2.0 Userspace Tracer (common library)
ii  liblttng-ust-ctl5:amd64                     2.13.7-1~ubuntu22.04.1                  amd64        LTTng 2.0 Userspace Tracer (trace control library)
ii  liblttng-ust-dev:amd64                      2.13.7-1~ubuntu22.04.1                  amd64        LTTng 2.0 Userspace Tracer (development files)
ii  liblttng-ust-python-agent1:amd64            2.13.7-1~ubuntu22.04.1                  amd64        LTTng 2.0 Userspace Tracer (Python agent native library)
ii  liblttng-ust1:amd64                         2.13.7-1~ubuntu22.04.1                  amd64        LTTng 2.0 Userspace Tracer (tracing libraries)
ii  lttng-modules-dkms                          2.13.11-1~ubuntu22.04.1                 all          Linux Trace Toolkit (LTTng) kernel modules (DKMS)
ii  lttng-tools                                 2.13.10-1~ubuntu22.04.2                 amd64        LTTng control and utility programs

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
$ echo $LD_LIBRARY_PATH

$ echo $MODPROBE_OPTIONS

I used MAAS to install the server. Let me know if you need other information.

Best,
Arnaud

Actions #11

Updated by Arnaud Fiorini 9 months ago

Hi Kienan,

I tested the same script with a VM running Ubuntu 22 and I updated the lttng versions. The trace is now empty but some errors are output to the lttng-sessiond log:

Mar 22 12:19:01 dorsal-vm2 lttng-sessiond[48513]: Error: ask_channel_creation consumer command failed
Mar 22 12:19:01 dorsal-vm2 lttng-sessiond[48513]: Error: Error creating UST channel "chan_ust" on the consumer daemon

I joined the verbose output of lttng-sessiond.

Thanks,
Arnaud

Actions #12

Updated by Arnaud Fiorini 9 months ago

Forgot to add the lttng versions I am using:

ii  liblttng-ctl0:amd64                         2.13.x+stable+git6349+202401232214~ubuntu22.04.1 amd64        LTTng control and utility library
ii  liblttng-ust-common1:amd64                  2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64        LTTng 2.0 Userspace Tracer (common library)
ii  liblttng-ust-ctl5:amd64                     2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64        LTTng 2.0 Userspace Tracer (trace control library)
ii  liblttng-ust-dev:amd64                      2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64        LTTng 2.0 Userspace Tracer (development files)
ii  liblttng-ust-python-agent1:amd64            2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64        LTTng 2.0 Userspace Tracer (Python agent native library)
ii  liblttng-ust1:amd64                         2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64        LTTng 2.0 Userspace Tracer (tracing libraries)
ii  lttng-tools                                 2.13.x+stable+git6349+202401232214~ubuntu22.04.1 amd64        LTTng control and utility programs

Actions #13

Updated by Kienan Stewart 9 months ago

Hi Arnaud,

thanks for the update!

I'm able to reproduce what you're seeing in #1392-12 with the stable-2.13 daily. I`ll be able to investigate more next week.

thanks,
kienan

Actions #14

Updated by Jérémie Galarneau 9 months ago

Hi Arnaud,

Can you collect the session daemon logs when it is launched with the LTTNG_UST_DEBUG=1 environment variable?

Also, please pass -vvv --verbose-consumer to lttng-sessiond.

Thanks!

Updated by Arnaud Fiorini 9 months ago

Hi Kienan, Jérémie,

I collected the logs twice, once with the same version as listed in my message 4 days ago (err2.txt) and once with stable-2.13 (err3.txt).

I saw the current log message: libringbuffer[54528/54537]: Error: zero_file: No space left on device (in _shm_object_table_alloc_shm() at ringbuffer/shm.c:131)
However I have the following on the VM:

~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           794M  1.1M  793M   1% /run
/dev/vda2        92G  7.8G   79G  10% /
tmpfs           3.9G   28K  3.9G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           794M  4.0K  794M   1% /run/user/1000

Actions #16

Updated by Kienan Stewart 9 months ago

How many processors are on the VM? The baremetal instance you were talking about had 96.

The memory estimate for a given channel is something like: nCPU * nUIDs * nSubbuffers * nSubbuffersize. With chan_ust, there are 8 buffers of 64M, so 512M * nCPU * nUIDs. If you have 8 or more cores, the allocation will probably fail as that will exceed the available capacity of /dev/shm. While not detailed with a summary calculation, the scheme is documented at https://lttng.org/docs/v2.13/#doc-channel-buffering-schemes

Note that channels that are snapshot enabled use an additional subbuffer, but that isn't the case here.

Actions #17

Updated by Kienan Stewart 9 months ago

My impression is that the issues in #1392-12 with the empty trace are shared memory size things and different than the original issue reported. We still haven't been able to reproduce the [error] Stream 0 is not declared in metadata.

Arnaud, would you be willing to re-run the test on the initial machine with both the environment variable LTTNG_UST_DEBUG=1 for both the sessiond and the traced application and with -vvv --verbose-consumer as additional arguments to the sessiond?

thanks,
kienan

Actions #18

Updated by Arnaud Fiorini 9 months ago

Hi Kienan,

You are right, the VM has 8 CPUs, so with 8 subbuffers, it will exceed the available capacity. I lowered the number of subbuffers and the other issue reappeared, even with the latest daily builds.

Thank you,
Arnaud

Actions

Also available in: Atom PDF