Bug #1339
closed.NET built against LTTng v2.13.0 crashes on startup
100%
Description
When we build .NET on Fedora Rawhide (36), it segfaults on startup while initializing LTTng.
* frame #0: 0x00007f8885a47b82 liblttng-ust.so.1`check_event_provider + 162 frame #1: 0x00007f8885a4d4d1 liblttng-ust.so.1`lttng_ust_probe_register + 33 frame #2: 0x00007f8885b007b5 libcoreclrtraceptprovider.so`lttng_ust__events_init__DotNETRuntime() at ust-tracepoint-event.h:1198:14 frame #3: 0x00007f888683fa2e ld-linux-x86-64.so.2`call_init(l=<unavailable>, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:70:3 frame #4: 0x00007f888683fb1c ld-linux-x86-64.so.2`_dl_init(main_map=0x0000556bd608a290, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:117:5 frame #5: 0x00007f88864534c5 libc.so.6`_dl_catch_exception + 229 frame #6: 0x00007f88868437de ld-linux-x86-64.so.2`dl_open_worker at dl-open.c:821:5 frame #7: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136 frame #8: 0x00007f8886843b5c ld-linux-x86-64.so.2`_dl_open at dl-open.c:896:17 frame #9: 0x00007f888638294c libc.so.6`dlopen_doit + 92 frame #10: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136 frame #11: 0x00007f8886453533 libc.so.6`_dl_catch_error + 51 frame #12: 0x00007f888638244e libc.so.6`_dlerror_run + 142 frame #13: 0x00007f88863829d8 libc.so.6`dlopen@GLIBC_2.2.5 + 72 frame #14: 0x00007f8885fd6893 libcoreclr.so`PAL_InitializeTracing() at tracepointprovider.cpp:116:9 frame #15: 0x00007f888683fa2e ld-linux-x86-64.so.2`call_init(l=<unavailable>, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:70:3 frame #16: 0x00007f888683fb1c ld-linux-x86-64.so.2`_dl_init(main_map=0x0000556bd6060050, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:117:5 frame #17: 0x00007f88864534c5 libc.so.6`_dl_catch_exception + 229 frame #18: 0x00007f88868437de ld-linux-x86-64.so.2`dl_open_worker at dl-open.c:821:5 frame #19: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136 frame #20: 0x00007f8886843b5c ld-linux-x86-64.so.2`_dl_open at dl-open.c:896:17 frame #21: 0x00007f888638294c libc.so.6`dlopen_doit + 92 frame #22: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136 frame #23: 0x00007f8886453533 libc.so.6`_dl_catch_error + 51 frame #24: 0x00007f888638244e libc.so.6`_dlerror_run + 142 frame #25: 0x00007f88863829d8 libc.so.6`dlopen@GLIBC_2.2.5 + 72 frame #26: 0x00007f8886274ead libhostpolicy.so`pal::load_library(path="/home/tmds/rpmbuild/BUILD/dotnet-9e8b04bbff820c93c142f99a507a46b976f5c14c-x64-bootstrap/src/aspnetcore.ae1a6cbe225b99c0bf38b7e31bf60cb653b73a52/artifacts/source-build/self/package-cache/microsoft.netcore.app.crossgen2.linux-x64/6.0.0/tools/libcoreclr.so", dll=0x00007f888629e0a0) at pal.unix.cpp:230:12 ...
The crash happens at this line: https://github.com/lttng/lttng-ust/blob/4c155a06d838e1ab5d385abd1d73ae56e71b7d5e/src/lib/lttng-ust/lttng-probes.c#L153.
The field is null.
(gdb) p *tp_class $3 = {struct_size = 48, fields = 0x7ffff73ab2e0 <lttng_ust__event_fields___DotNETRuntime___GCStart>, nr_fields = 2, probe_callback = 0x7ffff7364820 <lttng_ust__event_probe__DotNETRuntime___GCStart(void*, unsigned int, unsigned int)>, signature = 0x7ffff738e720 <__tp_event_signature___DotNETRuntime___GCStart> "const unsigned int, Count, const unsigned int, Reason", probe_desc = 0x7ffff73a1470 <lttng_ust__probe_desc___DotNETRuntime>} (gdb) p tp_class->fields[0] $4 = (const struct lttng_ust_event_field * const) 0x0 (gdb) p tp_class->fields[1] $5 = (const struct lttng_ust_event_field * const) 0x0
It seems the fields have not been initialized (yet):
(gdb) p lttng_ust__event_fields___DotNETRuntime___GCStart $1 = {0x0, 0x0, 0x0}
.NET tracepoints are defined in a separate .so file that is loaded using dlopen: https://github.com/dotnet/runtime/blob/b0159122f3570decd8ced6228681a210e2711de6/src/coreclr/pal/src/misc/tracepointprovider.cpp#L120-L122.
I've attached the files that are responsible for defining the tracepoints.
This is the .NET ticket for this issue: https://github.com/dotnet/runtime/issues/62398.
Files
Updated by Mathieu Desnoyers over 2 years ago
- Status changed from New to Feedback
Can you try the following fix and let us know if it fixes the issue on your end ?
https://review.lttng.org/c/lttng-ust/+/6870
You will need to recompile the tracepoint probe provider with the updated header.
Updated by Jérémie Galarneau over 2 years ago
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
Applied in changeset lttng-ust:lttng-ust|05bfa3dc3a6e6b2ece3686a5f384b6645c2a5010.