Bug #1392
openlttng view: Stream 0 is not declared in metadata
0%
Description
While trying to capture UST events from a simple program, even though kernel events seem to be properly captured, UST events are not. Trace Compass cannot decode the UST data (but it can decode from the kernel channel).
The application is single-threaded, performs only disk and stdout I/O, nothing fancy. Compiled with -finstrument-functions
, no optimizations, and with debugging symbols.
Session setup:
# lttng create # lttng enable-channel -u --subbuf-size=64M --num-subbuf=8 chan_ust # lttng enable-channel -k --subbuf-size=64M --num-subbuf=2 chan_kernel # lttng add-context -u -t vpid -t ip -t procname -t vtid -c chan_ust # lttng enable-event -u -a -c chan_ust # lttng enable-event -k -c chan_kernel --syscall --all # lttng enable-event -k -c chan_kernel lttng_statedump_block_device,lttng_statedump_file_descriptor,lttng_statedump_process_state,mm_page_alloc,mm_page_free,net_dev_xmit,netif_receive_skb,sched_pi_setprio,sched_process_exec,sched_process_fork,sched_switch,sched_wakeup,sched_waking,softirq_entry,softirq_exit,softirq_raise # lttng start (possibly excessive preloads, but these are from my notes, collecting data on Varnish Cache) # LTTNG_UST_BLOCKING_RETRY_TIMEOUT=-1 LD_PRELOAD=liblttng-ust-fd.so:liblttng-ust-fork.so:liblttng-ust-dl.so:liblttng-ust-cyg-profile.so:liblttng-ust-libc-wrapper.so:liblttng-ust-pthread-wrapper.so tests/test-playlist tests/playlist/bug_ffmpeg_1000-x20.m3u # lttng stop # lttng view > /dev/null [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata. [error] Stream 0 is not declared in metadata.Environment:
- lttng-modules-2.13.10 (from tarball)
- lttng-tools-2.13.11 (from tarball)
- lttng-ust-2.13.6 (from tarball)
- babeltrace (git; at
v1.5.11
) - Linux kernel 6.6.0-rc1
- Slackware64-current, up-to-date as of 2023-09-26
- Trace Compass 9.1.0.202309200833
IRC snippet:
Sep/27 16:13:54 rnsanchez I'm trying to find out what I'm doing wrong.. I keep getting "[error] Stream 0 is not declared in metadata." in my captures. I guess this is making tracecompass not being able to show UST events (simple function enter-exit instrumentation) Sep/27 16:16:46 rnsanchez using my old notes, while checking https://lttng.org/docs/v2.13/#doc-liblttng-ust-cyg-profile and other sections Sep/27 16:18:25 jgalar rnsanchez: shot in the dark, but are you making sure the session is stopped or destroyed before reading the trace? Sep/27 16:18:28 rnsanchez if it helps knowing, events from kernel channel are fine and I can see them in tracecompass (the trace is huge) Sep/27 16:18:52 rnsanchez jgalar: yes, with lttng stop, then I copy the data from root to my user Sep/27 16:19:10 rnsanchez but I get that Stream 0 regardless.. it's from lttng view Sep/27 16:20:11 jgalar it sounds like a metadata flushing issue, any chance you can share the trace? (or at least, the metadata files) Sep/27 16:20:42 rnsanchez yes, sure. just a moment while I upload it Sep/27 16:20:48 jgalar okay thanks Sep/27 16:21:43 ebugden not sure if related, but it's triggering deja vu of this issue: https://github.com/efficios/barectf/commit/d024537859c1d869bfa1cedc8abe8e3f7a648faa Sep/27 16:22:23 ebugden maybe a similar problem in a similar area in lttng view Sep/27 16:25:28 rnsanchez jgalar: https://rnsanchez.wait4.org/auto-20230927-161138.tar.xz Sep/27 16:27:33 rnsanchez jgalar: this last one has incomplete events as I was trying to investigate, but I can bundle another one with full events as I used to collect Sep/27 16:31:51 rnsanchez ebugden: is there anything on my side I can do? Sep/27 16:34:38 ebugden rnsanchez: i don't feel i'm familiar enough with this situation to say; i was tossing out the association as possible food for thought for jgalar Sep/27 16:35:12 rnsanchez oh ok :-) Sep/27 16:43:03 rnsanchez jgalar: just in case, here is with full events: https://rnsanchez.wait4.org/auto-20230927-164014.tar.xz Sep/27 16:45:53 jgalar ebugden: yup, the errors are indeed similar Sep/27 16:50:59 jgalar hmm, the user space trace's metadata doesn't declare any stream class, that's... unexpected... Sep/27 16:51:14 jgalar which versions of lttng-ust and lttng-tools are you running? Sep/27 16:55:03 rnsanchez ust 2.13.6 (tarball) and tools 2.13.11 (tarball too) Sep/27 16:55:07 rnsanchez compiled today Sep/27 16:55:13 jgalar okay Sep/27 17:17:57 ebugden rnsanchez (cc: jgalar): would you open a bug report on https://bugs.lttng.org/ ? Sep/27 17:18:16 rnsanchez ebugden: sure Sep/27 17:19:21 ebugden thanks! (we have a few hypotheses, but we'd need more information about exactly how the trace is generated) Sep/27 17:19:58 jgalar rnsanchez: it's weird, the user space trace's stream files just have the packet headers... if you can reproduce the problem while running lttng-sessiond with the `-vvv --verbose-consumer` options to get the logs, it would be helpful Sep/27 17:20:47 rnsanchez I can try. I suppose I have to finish the sessiond already running and manually launch another? Sep/27 17:21:06 jgalar yep
Files
Updated by Arnaud Fiorini 9 months ago
I am running in to the same issue. It appears to happen more often with bigger subbuffer sizes.
Some additional information:
Lttng 2.13.10
liblttng-dev 2.13.7-1~ubuntu20.04.1
Ubuntu 20.04.6 LTS
My /home is stored on an NFS but I am storing the trace in a local folder.
Updated by Jérémie Galarneau 9 months ago
- Status changed from New to Feedback
Hi Arnaud,
- Can you share the trace?
- Does the trace become readable if you wait a while before using
lttng view
?
Thanks!
Jérémie
Updated by Kienan Stewart 9 months ago
Updated by Arnaud Fiorini 9 months ago
- File stream0_not_declared.tgz stream0_not_declared.tgz added
Hi Jérémie,
1. Here is one example generated with 64 MiB subbuffers.
2. I did a sleep for 10 seconds between lttng stop
and lttng view
. I still get the error.
Thank you !
Arnaud
Updated by Jérémie Galarneau 9 months ago · Edited
- Assignee set to Jérémie Galarneau
Hi Arnaud,
Some quick questions:- Can you join the output of the
df -H
comment? I'd like to see if you are running out of space on/dev/shm
- How many CPUs are there on your machine? (What are the contents of
/sys/devices/system/cpu/possible
?) - Can you give me an idea of what this application does?
- Does it run for a while?
- Does it get killed or does it exit normally?
- Is it crashing?
- Do you have a reproducer that you can share?
- Is this running in a containerized environment by any chance? (is
/proc
visible to the application and the LTTng daemons)
Thanks!
Jérémie
Updated by Arnaud Fiorini 9 months ago
1.
Filesystem Size Used Avail Use% Mounted on udev 271G 0 271G 0% /dev tmpfs 55G 25M 55G 1% /run /dev/nvme3n1p2 944G 50G 846G 6% / tmpfs 271G 25k 271G 1% /dev/shm tmpfs 5.3M 0 5.3M 0% /run/lock tmpfs 271G 0 271G 0% /sys/fs/cgroup /dev/loop0 68M 68M 0 100% /snap/core20/2105 /dev/nvme3n1p1 536M 6.4M 530M 2% /boot/efi /dev/loop1 68M 68M 0 100% /snap/core20/2182 /dev/md0 2.9T 210G 2.7T 8% /scratch /dev/loop2 97M 97M 0 100% /snap/lxd/24061 /dev/loop3 43M 43M 0 100% /snap/snapd/20290 /dev/loop4 97M 97M 0 100% /snap/lxd/23991 /dev/loop5 43M 43M 0 100% /snap/snapd/20671 192.168.63.160:/mnt/disk1 2.0T 148G 1.9T 8% /home tmpfs 55G 0 55G 0% /run/user/1000
2. 48 cores / 96 threads (0-95)
3. This application runs a couple of GPU kernel function a number of times (20000) and returns.
3.1. runtime is 80ms and .5s for the whole application.
3.2. It exits normally.
3.3. No.
4. It is running in a ROCm RC environment but I can provide the program that I am using.
5. No.
Thanks,
Arnaud
Updated by Kienan Stewart 9 months ago
Hi Arnaud, Ricardo,
do you see the issue when repeating the test procedure using different userspace applications, eg. https://github.com/lttng/lttng-ust/tree/master/doc/examples/hello-static-lib ?
thanks,
kienan
Updated by Arnaud Fiorini 9 months ago
Hi Kienan,
I have the same issue running the hello example.
Thank you,
Arnaud
Updated by Kienan Stewart 9 months ago
- File 1392-reproducer.sh 1392-reproducer.sh added
Hi Arnaud,
thanks for the confirmation that you experience the issue using the hello-static-lib application.
on a fresh focal VM with the PPA debs (lttng-tools 2.13.10-1~ubuntu20.04.1, lttng-modules-dkms 2.13.11-1~ubuntu20.04.1, lttng-ust 2.13.7-1~ubuntu20.04.1) I'm still not able to reproduce the issue you're seeing using the attached script.
I've also tried using source built versions in the 2.13 branch following the versions Ricardo used.
Would you be comfortable sharing the output of `dpkg -l` and the relevant environment eg. PATH, LD_LIBRARY_PATH, MODPROBE_OPTIONS.
thanks,
kienan
Updated by Arnaud Fiorini 9 months ago
Hi Kienan,
I tested your script on a freshly installed server with the PPA debs. I was able to reproduce the issue. It was happening only on the ust side, the kernel trace was fine.
Here is the environment information:
$ dpkg -l | grep lttng ii liblttng-ctl0:amd64 2.13.10-1~ubuntu22.04.2 amd64 LTTng control and utility library ii liblttng-ust-common1:amd64 2.13.7-1~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (common library) ii liblttng-ust-ctl5:amd64 2.13.7-1~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (trace control library) ii liblttng-ust-dev:amd64 2.13.7-1~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (development files) ii liblttng-ust-python-agent1:amd64 2.13.7-1~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (Python agent native library) ii liblttng-ust1:amd64 2.13.7-1~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (tracing libraries) ii lttng-modules-dkms 2.13.11-1~ubuntu22.04.1 all Linux Trace Toolkit (LTTng) kernel modules (DKMS) ii lttng-tools 2.13.10-1~ubuntu22.04.2 amd64 LTTng control and utility programs $ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin $ echo $LD_LIBRARY_PATH $ echo $MODPROBE_OPTIONS
I used MAAS to install the server. Let me know if you need other information.
Best,
Arnaud
Updated by Arnaud Fiorini 8 months ago
Hi Kienan,
I tested the same script with a VM running Ubuntu 22 and I updated the lttng versions. The trace is now empty but some errors are output to the lttng-sessiond log:
Mar 22 12:19:01 dorsal-vm2 lttng-sessiond[48513]: Error: ask_channel_creation consumer command failed Mar 22 12:19:01 dorsal-vm2 lttng-sessiond[48513]: Error: Error creating UST channel "chan_ust" on the consumer daemon
I joined the verbose output of lttng-sessiond.
Thanks,
Arnaud
Updated by Arnaud Fiorini 8 months ago
Forgot to add the lttng versions I am using:
ii liblttng-ctl0:amd64 2.13.x+stable+git6349+202401232214~ubuntu22.04.1 amd64 LTTng control and utility library ii liblttng-ust-common1:amd64 2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (common library) ii liblttng-ust-ctl5:amd64 2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (trace control library) ii liblttng-ust-dev:amd64 2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (development files) ii liblttng-ust-python-agent1:amd64 2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (Python agent native library) ii liblttng-ust1:amd64 2.13.x+stable+git3026+202403220132~ubuntu22.04.1 amd64 LTTng 2.0 Userspace Tracer (tracing libraries) ii lttng-tools 2.13.x+stable+git6349+202401232214~ubuntu22.04.1 amd64 LTTng control and utility programs
Updated by Kienan Stewart 8 months ago
Hi Arnaud,
thanks for the update!
I'm able to reproduce what you're seeing in #1392-12 with the stable-2.13 daily. I`ll be able to investigate more next week.
thanks,
kienan
Updated by Jérémie Galarneau 8 months ago
Hi Arnaud,
Can you collect the session daemon logs when it is launched with the LTTNG_UST_DEBUG=1
environment variable?
Also, please pass -vvv --verbose-consumer
to lttng-sessiond
.
Thanks!
Updated by Arnaud Fiorini 8 months ago
Hi Kienan, Jérémie,
I collected the logs twice, once with the same version as listed in my message 4 days ago (err2.txt) and once with stable-2.13 (err3.txt).
I saw the current log message: libringbuffer[54528/54537]: Error: zero_file: No space left on device (in _shm_object_table_alloc_shm() at ringbuffer/shm.c:131)
However I have the following on the VM:
~$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 794M 1.1M 793M 1% /run /dev/vda2 92G 7.8G 79G 10% / tmpfs 3.9G 28K 3.9G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 794M 4.0K 794M 1% /run/user/1000
Updated by Kienan Stewart 8 months ago
How many processors are on the VM? The baremetal instance you were talking about had 96.
The memory estimate for a given channel is something like: nCPU * nUIDs * nSubbuffers * nSubbuffersize
. With chan_ust
, there are 8 buffers of 64M, so 512M * nCPU * nUIDs. If you have 8 or more cores, the allocation will probably fail as that will exceed the available capacity of /dev/shm
. While not detailed with a summary calculation, the scheme is documented at https://lttng.org/docs/v2.13/#doc-channel-buffering-schemes
Note that channels that are snapshot enabled use an additional subbuffer, but that isn't the case here.
Updated by Kienan Stewart 8 months ago
My impression is that the issues in #1392-12 with the empty trace are shared memory size things and different than the original issue reported. We still haven't been able to reproduce the [error] Stream 0 is not declared in metadata.
Arnaud, would you be willing to re-run the test on the initial machine with both the environment variable LTTNG_UST_DEBUG=1
for both the sessiond and the traced application and with -vvv --verbose-consumer
as additional arguments to the sessiond?
thanks,
kienan
Updated by Arnaud Fiorini 8 months ago
Hi Kienan,
You are right, the VM has 8 CPUs, so with 8 subbuffers, it will exceed the available capacity. I lowered the number of subbuffers and the other issue reappeared, even with the latest daily builds.
Thank you,
Arnaud