Project

General

Profile

Actions

Bug #1339

closed

.NET built against LTTng v2.13.0 crashes on startup

Added by Tom Deseyn over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
12/06/2021
Due date:
% Done:

100%

Estimated time:

Description

When we build .NET on Fedora Rawhide (36), it segfaults on startup while initializing LTTng.

  * frame #0: 0x00007f8885a47b82 liblttng-ust.so.1`check_event_provider + 162
    frame #1: 0x00007f8885a4d4d1 liblttng-ust.so.1`lttng_ust_probe_register + 33
    frame #2: 0x00007f8885b007b5 libcoreclrtraceptprovider.so`lttng_ust__events_init__DotNETRuntime() at ust-tracepoint-event.h:1198:14
    frame #3: 0x00007f888683fa2e ld-linux-x86-64.so.2`call_init(l=<unavailable>, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:70:3
    frame #4: 0x00007f888683fb1c ld-linux-x86-64.so.2`_dl_init(main_map=0x0000556bd608a290, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:117:5
    frame #5: 0x00007f88864534c5 libc.so.6`_dl_catch_exception + 229
    frame #6: 0x00007f88868437de ld-linux-x86-64.so.2`dl_open_worker at dl-open.c:821:5
    frame #7: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136
    frame #8: 0x00007f8886843b5c ld-linux-x86-64.so.2`_dl_open at dl-open.c:896:17
    frame #9: 0x00007f888638294c libc.so.6`dlopen_doit + 92
    frame #10: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136
    frame #11: 0x00007f8886453533 libc.so.6`_dl_catch_error + 51
    frame #12: 0x00007f888638244e libc.so.6`_dlerror_run + 142
    frame #13: 0x00007f88863829d8 libc.so.6`dlopen@GLIBC_2.2.5 + 72
    frame #14: 0x00007f8885fd6893 libcoreclr.so`PAL_InitializeTracing() at tracepointprovider.cpp:116:9
    frame #15: 0x00007f888683fa2e ld-linux-x86-64.so.2`call_init(l=<unavailable>, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:70:3
    frame #16: 0x00007f888683fb1c ld-linux-x86-64.so.2`_dl_init(main_map=0x0000556bd6060050, argc=10, argv=0x00007ffcd00cfd88, env=0x00007ffcd00cfde0) at dl-init.c:117:5
    frame #17: 0x00007f88864534c5 libc.so.6`_dl_catch_exception + 229
    frame #18: 0x00007f88868437de ld-linux-x86-64.so.2`dl_open_worker at dl-open.c:821:5
    frame #19: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136
    frame #20: 0x00007f8886843b5c ld-linux-x86-64.so.2`_dl_open at dl-open.c:896:17
    frame #21: 0x00007f888638294c libc.so.6`dlopen_doit + 92
    frame #22: 0x00007f8886453468 libc.so.6`_dl_catch_exception + 136
    frame #23: 0x00007f8886453533 libc.so.6`_dl_catch_error + 51
    frame #24: 0x00007f888638244e libc.so.6`_dlerror_run + 142
    frame #25: 0x00007f88863829d8 libc.so.6`dlopen@GLIBC_2.2.5 + 72
    frame #26: 0x00007f8886274ead libhostpolicy.so`pal::load_library(path="/home/tmds/rpmbuild/BUILD/dotnet-9e8b04bbff820c93c142f99a507a46b976f5c14c-x64-bootstrap/src/aspnetcore.ae1a6cbe225b99c0bf38b7e31bf60cb653b73a52/artifacts/source-build/self/package-cache/microsoft.netcore.app.crossgen2.linux-x64/6.0.0/tools/libcoreclr.so", dll=0x00007f888629e0a0) at pal.unix.cpp:230:12
...

The crash happens at this line: https://github.com/lttng/lttng-ust/blob/4c155a06d838e1ab5d385abd1d73ae56e71b7d5e/src/lib/lttng-ust/lttng-probes.c#L153.

The field is null.

(gdb) p *tp_class
$3 = {struct_size = 48, fields = 0x7ffff73ab2e0 <lttng_ust__event_fields___DotNETRuntime___GCStart>, nr_fields = 2, 
  probe_callback = 0x7ffff7364820 <lttng_ust__event_probe__DotNETRuntime___GCStart(void*, unsigned int, unsigned int)>, 
  signature = 0x7ffff738e720 <__tp_event_signature___DotNETRuntime___GCStart> "const unsigned int, Count, const unsigned int, Reason", probe_desc = 0x7ffff73a1470 <lttng_ust__probe_desc___DotNETRuntime>}
(gdb) p tp_class->fields[0]
$4 = (const struct lttng_ust_event_field * const) 0x0
(gdb) p tp_class->fields[1]
$5 = (const struct lttng_ust_event_field * const) 0x0

It seems the fields have not been initialized (yet):

(gdb) p lttng_ust__event_fields___DotNETRuntime___GCStart
$1 = {0x0, 0x0, 0x0}

.NET tracepoints are defined in a separate .so file that is loaded using dlopen: https://github.com/dotnet/runtime/blob/b0159122f3570decd8ced6228681a210e2711de6/src/coreclr/pal/src/misc/tracepointprovider.cpp#L120-L122.

I've attached the files that are responsible for defining the tracepoints.

This is the .NET ticket for this issue: https://github.com/dotnet/runtime/issues/62398.


Files

traceptprovdotnetruntime.cpp (461 Bytes) traceptprovdotnetruntime.cpp Tom Deseyn, 12/06/2021 09:38 AM
tpdotnetruntime.h (107 KB) tpdotnetruntime.h Tom Deseyn, 12/06/2021 09:38 AM
Actions #1

Updated by Mathieu Desnoyers over 2 years ago

  • Status changed from New to Feedback

Can you try the following fix and let us know if it fixes the issue on your end ?

https://review.lttng.org/c/lttng-ust/+/6870

You will need to recompile the tracepoint probe provider with the updated header.

Actions #2

Updated by Jérémie Galarneau over 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF