LTTng bugs repository: Issueshttps://bugs.lttng.org/https://bugs.lttng.org/themes/lttng/favicon/a.ico?14249722912014-09-09T13:17:10ZLTTng bugs repository
Redmine LTTng-tools - Bug #833 (Confirmed): memcpy of non-packed struct into packed struct (possible layo...https://bugs.lttng.org/issues/8332014-09-09T13:17:10ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>lttng-tools src/lib/lttng-ctl/lttng-ctl.c:</p>
<p>lttng_enable_event_with_exclusions()</p>
<p>memcpy(&lsm.u.enable.event, ev, sizeof(lsm.u.enable.event));</p>
<p>copy "ev" (non-packed) into a packed structure.</p>
<p>We should copy each field one by one (create a copy_event_to_event_packed() helper to do so).</p> LTTng-tools - Bug #822 (Confirmed): bash-completion sometimes completes too muchhttps://bugs.lttng.org/issues/8222014-07-28T18:04:17ZSimon Marchisimon.marchi@polymtl.ca
<p>I am not sure how to word it, but here is an example. I have a single session, named "auto-20140728-134322".</p>
<p>$ lttng des<tab> => $ lttng destroy</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy <tab> => $ lttng destroy auto-20140728-134322</p>
<p>That's good, now let's press tab again:</p>
<p>$ lttng destroy auto-20140728-134322 <tab> => $ lttng destroy auto-20140728-134322 auto-20140728-134322</p>
<p>Oops, we can go like that for a long time. Every <tab> press adds a "auto-20140728-134322". It should detect that we already gave a positional argument. Since the command takes only one positional argument, subsequent <tab> presses should do nothing.</p> LTTng-tools - Bug #797 (Feedback): Add more test for epoll in configurehttps://bugs.lttng.org/issues/7972014-05-21T19:26:05ZYannick Brosseauyannick.brosseau@polymtl.ca
<p>I was trying to compile lttng-tools on centos 5 (yes I know too old), but configure did not complain.</p>
<p>We should test for the presence of these and fail configure or add a compat layer.</p>
<p>In file included from compat-epoll.c:33:<br />poll.h:72: error: 'EPOLLRDHUP' undeclared here (not in a function)<br />poll.h:74: error: 'EPOLL_CLOEXEC' undeclared here (not in a function)<br />compat-epoll.c: In function 'compat_epoll_create':<br />compat-epoll.c:84: warning: implicit declaration of function 'epoll_create1'</p> LTTng-tools - Bug #752 (In Progress): Output paths need better handling than truncationhttps://bugs.lttng.org/issues/7522014-03-07T21:33:50ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
For instance, in the aftermath of <code>lttng-tools/src/bin/lttng-sessiond/cmd.c:record_ust_snapshot</code>, the <code>msg.u.snapshot_channel.pathname</code> is limited to PATH_MAX (typically 4096) but is built with <code>"%s/%s-%s-%" PRIu64 "%s"</code>, where the successive arguments are (discounting the closing nulls):
<ul>
<li><code>output->consumer->dst.trace_path</code> PATH_MAX - 1 (no trailing /)</li>
<li>/ 1</li>
<li><code>output->name</code> NAME_MAX - 1</li>
<li>- 1</li>
<li><code>output->datetime</code> 15</li>
<li>- 1</li>
<li><code>output->nb_snapshot</code> 20 digits (unsigned 64-bit integer)</li>
<li><code>session_path</code> PATH_MAX - 1 (including leading and trailing /)</li>
</ul>
<p>The worst-case <code>session_path</code> part is <code>'/ust/pid/<proc>-<vpid>-<datetime>/'</code> so it's actually limited to 12+15+5+15 = 47 characters (<code>/proc/PID/status.name</code> is truncated to 15 characters, and VPID is unsigned 16-bit for 5 characters) (closing null excluded). So one solution would be to limit the <code>consumer->dst.trace_path</code> to PATH_MAX - (NAME_MAX - 1 + 15 + 20 + 47 + 3) - 1 (for the null). However, if we want the path+filetitles of the channel files to fit in PATH_MAX, we need to chop another NAME_MAX off (and also limit channel names to NAME_MAX - (1 + 5 + 1 + 10 + 1) [underscore, 16-bit unsigned CPU ID, underscore, 32-bit unsigned chunk number, null] so they fit).</p>
<p>Truncation remains nevertheless possible, and would wreak havoc with the trace output tree. <code>Babeltrace</code> and the user count on proper folder and file tree structure to manage their traces. The code needs to detect instances of truncation and report them as errors.</p>
<p>As an aside, the snapshot output name should be limited to MAX_PATH - (1+10) because it gets suffixed with a hyphen and an unsigned 32-bit integer (the output set sequential ID).</p> LTTng-tools - Bug #720 (Confirmed): Disambiguating fully qualified event names, timestamp-sensiti...https://bugs.lttng.org/issues/7202014-01-16T14:11:33ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>In user-space, two applications may be using different tracepoint providers that register their events under the same provider name and event name (I guess this could also happen in kernel space if two modules are loaded with user-designed providers that similarly clashing events). Granted, not an usual occurrence but it could happen, for instance if a long-running trace sees an application undergoing an upgrade between executions.</p>
<p>The problem is that unless the buffering scheme is per-processID, the metadata will only capture the first event registration. The trace will be captured correctly, but <code>babeltrace</code> won't be able to correctly decode the second application's events. This can either lead to <code>babeltrace</code> quitting (typically with "<code>[error] Event id nn is outside range</code>") or generating incorrectly labelled event payloads, even spurious additional events.</p>
<p>This is not an easy bug to solve. At first glance it would seem the metadata should be enriched with timestamped event definitions, but by itself this won't be enough because the trace would still need to have some way to correctly disambiguate same-name events as they come in.</p>
<p>The bug should not be dismissed as "user error" because it could be used as part of an attack to disable system monitoring (consider what would happen if two same-name-but-different-payloads events were fed to a live trace analyzer).</p> LTTng-tools - Bug #653 (Confirmed): LTTng memory allocation failure goes unreportedhttps://bugs.lttng.org/issues/6532013-10-21T13:32:43ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Consider the following (LTTng 2.3.0 running in a single-processor 1 GiB virtual machine):</p>
<pre>
$ lttng create test
Session test created.
Traces will be written in /home/daniel/lttng-traces/test-20131018-122002
$ lttng enable-channel ch -u --discard --num-subbuf 0x800
UST channel ch enabled for session test
$ lttng enable-event -c ch -u -a
All UST events are enabled in channel ch
$ lttng start
Tracing started for session test
$ lttng list test
Tracing session test: [active]
Trace path: /home/daniel/lttng-traces/test-20131018-122002
=== Domain: UST global ===
Buffer type: per UID
Channels:
-------------
- ch: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 131072
number of subbufers: 2048
switch timer interval: 0
read timer interval: 0
output: mmap()
Events:
* (type: tracepoint) [enabled]
$ lttng destroy
Session test destroyed
$ lttng create test
Session test created.
Traces will be written in /home/daniel/lttng-traces/test-20131018-122241
$ lttng enable-channel ch -u --discard --num-subbuf 0x1000
UST channel ch enabled for session test
$ lttng enable-event -c ch -u -a
All UST events are enabled in channel ch
$ lttng start
Tracing started for session test
$ lttng list test
Tracing session test: [active]
Trace path: /home/daniel/lttng-traces/test-20131018-122241
=== Domain: UST global ===
Buffer type: per UID
Channels:
-------------
- ch: [enabled]
Attributes:
overwrite mode: 0
subbufers size: 131072
number of subbufers: 4096
switch timer interval: 0
read timer interval: 0
output: mmap()
Events:
* (type: tracepoint) [enabled]
$ lttng destroy
Session test destroyed
</pre>
<p>The first session generates a trace as expected. The second does not: the trace folder remains stubbornly empty.</p>
<p>This is puzzling from a memory management point of view (the first session grabs 256 MiB of buffers, the second should grab 512 MiB), since the system's capacity was apparently not reached.</p>
<p>The one error captured in the session log (attached) is:<br /><pre>
libringbuffer[14690/14695]: Error: zero_file: No space left on device (in _shm_object_table_alloc_shm() at shm.c:173)
</pre></p>
<p>The error needs to be properly captured and passed on to the lttng client.</p> LTTng-tools - Bug #592 (Confirmed): Potential trace process subdirectory name collision with PID ...https://bugs.lttng.org/issues/5922013-07-10T16:14:42ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Per-PID trace subdirectories are named according to the “name-vpid-timestamp” scheme, where "name" is a process name (truncated to 15 characters by the system), "vpid" is a virtual process ID (process ID within a PID namespace), and "timestamp" has a one-second resolution and matches the time when the first events are recorded for that particular process.</p>
<p>It is possible, within a one-second window (same timestamp), to spawn two copies of a given process (same name) into two different PID namespaces (allowing the same VPIDs). There could thus be a collision in the trace output directory structure, since the “name-vpid-timestamp” process subdirectory names could match.</p>
<p>This bug is very similar to <a class="issue tracker-1 status-7 priority-3 priority-lowest" title="Bug: Under certain conditions, a user-space trace may overwrite itself (Confirmed)" href="https://bugs.lttng.org/issues/561">#561</a>.</p> LTTng-tools - Bug #561 (Confirmed): Under certain conditions, a user-space trace may overwrite it...https://bugs.lttng.org/issues/5612013-06-13T18:14:30ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>Suppose we do this:</p>
<pre>
$ sudo -H lttng create asession
$ sudo -H lttng enable-event -a -u
$ sudo -H lttng start
</pre>
<p>And suppose we have an application that has been instrumented with some user-space tracepoint provider. Suppose the application's main loop is something like this (borrowed from easy-ust):</p>
<pre>
int main(int argc, char **argv)
{
int i = 0;
char themessage[20]; //Can hold up to "Hello World 9999999\0"
void *libtp_handle;
libtp_handle = dlopen("./libtp.so", RTLD_LAZY);
fprintf(stderr, "sample starting\n");
for (i = 0; i < 10000; i++) {
if ((i == 3333) && (libtp_handle)) dlclose(libtp_handle);
if (i == 6666) libtp_handle = dlopen("./libtp.so", RTLD_LAZY);
sprintf(themessage, "Hello World %u", i);
tracepoint(sample_component, event, themessage);
usleep(1);
}
fprintf(stderr, "sample done\n");
if (libtp_handle) return dlclose(libtp_handle);
return 0;
}
</pre>
<p>The trace produced will capture two separate processes: the first one for the app's first 3333 loops, the second for the app's last 3333 loops. This is because the app will register itself as a user-space event source, then withdraw its registration only to later re-register.</p>
<p>As it happens, most of the time the two third-runs will be a second apart, resulting in the trace holding two pid subdirectories: say <code>sample-17541-20130613-135148</code> and <code>sample-17541-20130613-135149</code>. But now and again both processes will be within the same one-second window, and the trace will thus contain only one pid subdirectory, say <code>sample-17541-20130613-135151</code> ---the problem is that the app's first 3333 loops were written to disk and then the last 3333 loops were written to the same file.</p>
<p>This only gets worse if the dlopen/dlclose calls are more tightly packed in time.</p>
<p>The bug boils down to this: once a tracing session detects a new process client, lttng should detect path collisions and correct for them. One solution would be to have a trace's path be:</p>
<pre>
tracepath/ust/pid/process_name-VPID-yyyymmdd-hhmmss[-n]/
</pre>
<p>In my example, the first 3333 loops would go to <code>tracepath/ust/pid/sample-17541-20130613-135151</code> and the last 3333 loops to <code>tracepath/ust/pid/sample-17541-20130613-135151-1</code></p>
<p>I suppose a similar problem can happen with per-uid traces.</p> LTTng-tools - Bug #531 (New): Event name scoping seems ill-definedhttps://bugs.lttng.org/issues/5312013-05-14T20:51:39ZDaniel U. Thibaultdaniel.thibault@drdc-rddc.gc.ca
<p>(Filing this bug report as per Mathieu Desnoyers's request of 6 May 2013)</p>
<p>A natural assumption concerning an event's "fully qualified" identifier is that it is something like session:domain:channel:name. Tracepoints and syscalls have system-defined names, and of course the domain can only be 'kernel' or 'userspace', but everything else is pretty much arbitrary. It follows thus that, as long as they're in different channels (or domains or possibly sessions), two events may bear the same name even though they represent completely different occurrences.</p>
<p>But that's not what lttng seems to be doing.</p>
<p>Here's what I did (lttng list output has been abridged), with comments interspersed:</p>
<pre>
$ sudo -H lttng create asession
Session asession created.
Traces will be written in /root/lttng-traces/asession-20130214-145626
$ sudo -H lttng enable-event sched_switch -k --tracepoint
kernel event sched_switch created in channel channel0
$ sudo -H lttng enable-event sched_switch -k --function lttng_calibrate_kretprobe --channel channel1
Error: Event sched_switch: Enable kernel event failed (channel channel1, session asession)
Warning: Some command(s) went wrong
/*
Okay, so maybe event names have session:domain scope. Let's test this.
*/
$ sudo -H lttng enable-event fevent -k --function lttng_calibrate_kretprobe --channel channel1
kernel event fevent created in channel channel1
$ sudo -H lttng enable-event fevent -k --function lttng_calibrate_kretprobe --channel channel2
kernel event fevent created in channel channel2
/*
Apparently not.
*/
$ sudo -H lttng list asession
Tracing session asession: [inactive]
=== Domain: Kernel ===
- channel2: [enabled]
Events:
fevent (type: probe) [enabled]
offset: 0x0
symbol: lttng_calibrate_kretprobe
- channel1: [enabled]
Events:
fevent (type: probe) [enabled]
offset: 0x0
symbol: lttng_calibrate_kretprobe
- channel0: [enabled]
Events:
sched_switch (loglevel: TRACE_EMERG (0)) (type: tracepoint) [enabled]
/*
Instead of tracepoint then function, let's try function then tracepoint
*/
$ sudo -H lttng enable-event sched_process_fork -k --function lttng_calibrate_kretprobe --channel channel3
kernel event sched_process_fork created in channel channel3
$ sudo -H lttng enable-event sched_process_fork -k
kernel event sched_process_fork created in channel channel0
$ sudo -H lttng enable-event sched_process_fork -k --function lttng_calibrate_kretprobe --channel channel4
Error: Event sched_process_fork: Enable kernel event failed (channel channel4, session asession)
Warning: Some command(s) went wrong
/*
Fascinating.
*/
$ sudo -H lttng list asession
Tracing session asession: [inactive]
- channel3: [enabled]
Events:
sched_process_fork (type: probe) [enabled]
offset: 0x0
symbol: lttng_calibrate_kretprobe
- channel2: [enabled]
Events:
fevent (type: probe) [enabled]
offset: 0x0
symbol: lttng_calibrate_kretprobe
- channel1: [enabled]
Events:
fevent (type: probe) [enabled]
offset: 0x0
symbol: lttng_calibrate_kretprobe
- channel0: [enabled]
Events:
sched_process_fork (loglevel: TRACE_EMERG (0)) (type: tracepoint) [enabled]
sched_switch (loglevel: TRACE_EMERG (0)) (type: tracepoint) [enabled]
</pre>
<p>At this point it looks like defining a kernel tracepoint prevents that tracepoint's name from being used in other channels to define other events (or even the same event). But nothing apparently prevents these other events from being set up <em>before</em> the tracepoint. (Not shown here is the case where a tracepoint is defined in one channel, and then in another: the second tracepoint event definition fails as well)</p>
<p>(Note that I haven't even tried yet to see what happens between simultaneous sessions...)</p>
<p>The control flow of the error-causing commands is something like:</p>
<ul>
<li>event.c's event_kernel_enable_tracepoint returns LTTNG_ERR_KERN_ENABLE_FAIL because</li>
<li>kernel.c's kernel_create_event gets something else than ENOSYS or EEXIST from</li>
<li>kernel-ctl.c's kernctl_create_event.</li>
<li>Beyond kernctl_create_event is debugfs, inside of which occurs...whatever.</li>
</ul> LTTng-UST - Bug #525 (Confirmed): new "notifications" from UST do not strictly respect LTTNG_UST_...https://bugs.lttng.org/issues/5252013-05-07T15:25:54ZMathieu Desnoyersmathieu.desnoyers@efficios.com
<p>We should eventually find a way to improve notifications from UST to sessiond so they don't delay .so loading (and thus application startup) when env. var. specify a LTTNG_UST_REGISTER_TIMEOUT=0. Currently, we work around this issue by setting the notification socket with a timeout of 100ms as minimum timeout if the LTTNG_UST_REGISTER_TIMEOUT value is below 100.</p>
<p>A cleaner fix could involve handling these notifications from a (possibly new) separate thread, and use a semaphore-based scheme to handle optional wait from the application.</p> LTTng-tools - Bug #409 (Confirmed): Detection of pipe close with POLLHUP poll(3) eventhttps://bugs.lttng.org/issues/4092012-12-10T21:33:54ZChristian Babeuxchristian.babeux@efficios.com
<p>On Linux, the POLLHUP poll(3) event is used to signal that the other end<br />of a pipe has been disconnected. Due to poor wording in the Single UNIX<br />Specification, differents UNIX implementation signal the EOF with<br />conflicting poll events [1].</p>
<p>The current implementation of pipe close detection in lttng-tools uses<br />the POLLHUP event. This could lead to infinite looping in threads on<br />platforms such as OpenBSD/Cygwin.</p>
<p>Possible workaround in <a class="changeset" title="Cygwin: Fix handling of wait pipe hangup by properly detecting EOF On Linux, the POLLHUP poll(3)..." href="https://bugs.lttng.org/projects/lttng-tools/repository/lttng-tools/revisions/060a32b279132ceeeef14b96a611077195a2ca46">060a32b279132ceeeef14b96a611077195a2ca46</a>.</p>
<p>Creating this issue so we don't forget this limitation if we want eventually<br />want to support those platforms.</p>
<p>[1] - <a class="external" href="http://www.greenend.org.uk/rjk/tech/poll.html">http://www.greenend.org.uk/rjk/tech/poll.html</a></p> LTTng-UST - Bug #292 (Confirmed): Generated header files should not conflict with ust or standard...https://bugs.lttng.org/issues/2922012-07-03T18:27:18ZMatthew Khouzam
<p>In Lttng-UST 2.0 if a given tracepoint file (foo.tp) has tracepint_events with domains that are not "foo" the tracepoint will not compile. This would be good to have a warning/error for, since if you don't it will just cause errors in the compilation phase which are <em>very</em> difficult to understand.</p> Common Trace Format - Bug #265 (New): Specify where exactly the event ID must be in the headerhttps://bugs.lttng.org/issues/2652012-06-15T15:43:02ZPhilippe Proulxeeppeliteloop@gmail.com
<p>Here's how to read an event, having already parsed the metadata:</p>
<ol>
<li>read the event header (this is defined per-stream)</li>
<li><strong>find the ID</strong></li>
<li>find the event declaration for that ID</li>
<li>read the binary event according to its declaration</li>
</ol>
<p>Problem lies in step 2. There's no definition in the specs. regarding how to find the ID field within the event header. It cannot be as simple as finding the <code>id</code> field in the event header declaration since it can be elsewhere.</p>
<p>A good example is the LTTng event header declaration, which are often:</p>
<pre>
struct event_header_large {
enum : uint16_t { compact = 0 ... 65534, extended = 65535 } id;
variant <id> {
struct {
uint32_clock_monotonic_t timestamp;
} compact;
struct {
uint32_t id;
uint64_clock_monotonic_t timestamp;
} extended;
} v;
} align(8);
</pre>
<p>Here, <code>id</code> is most of the time the actual ID, but sometimes it's 65535 in order to extend the header using the variant and <code>v.extended.id</code> is the real ID. This is not specified in the specs.</p>
<p>We need a way to know (in the metadata) where is the real ID and how to know it once we read the header (between steps 1 and 2).</p> Common Trace Format - Bug #262 (New): Be clearer about fields of headers and contextshttps://bugs.lttng.org/issues/2622012-06-08T19:21:17ZPhilippe Proulxeeppeliteloop@gmail.com
<p>I've read the CTF specs. for hours by now and I still think there's something wrong about it. It seems like there's a division between some structure fields and their descriptions. Instead of clearly describing fields of some structures, only examples are given. But how are the examples related to the descriptions?</p>
<p>I would really appreciate that all fields of the following structures be very well defined in the specs.:</p>
<ul>
<li><code>trace.packet.header</code></li>
<li><code>stream.packet.context</code></li>
<li><code>stream.event.header</code></li>
</ul>
<p>This is actually done, but we don't see the relation between the field names and their descriptions.</p>
<p>Here's an example. The list following <em>Event packet context (all fields are optional, specified by TSDL meta-data):</em> is okay, but look at <em>Event packet content size (in bits).</em>: we don't know anything about this field yet. It's only later in the text that we learn:</p>
<pre>
If the content
size field is missing, the packet is filled (no padding). The content
and packet sizes include all headers.
</pre>
<p>Still, it's only when looking at the <em>example</em> that we see its name:</p>
<pre>
struct event_packet_context {
uint64_t timestamp_begin;
uint64_t timestamp_end;
uint32_t checksum;
uint32_t stream_packet_count;
uint32_t events_discarded;
uint32_t cpu_id;
uint32_t/uint16_t content_size;
uint32_t/uint16_t packet_size;
uint8_t compression_scheme;
uint8_t encryption_scheme;
uint8_t checksum_scheme;
};
</pre>
<p>But an example isn't very formal, is it? So if I want to know something about this field, I have to look at 3 different places in the specs.</p>
<p>And what about the <code>cpu_id</code> field in there? Is this LTTng-related or standard within CTF? I guess <em>this one</em> is really an example, but it's confusing because we learn the content size field name at the same place.</p>
<p>A good and easy format to understand/read would be a table, for each aforementioned structure, with the following columns:</p>
<ul>
<li>optional/mandatory/conditional?</li>
<li>absence of field meaning: what exactly to expect if the field is absent?</li>
<li>conditions if field is conditional (depends on other parameters)</li>
<li>field name (e.g. <code>content_size</code>)</li>
<li>complete description</li>
</ul>
<p>Maybe a <em>since version</em> column would also be great, so we may have some backward compatibility.</p>
<p>Also: for each <strong>structure</strong>, is it allowed to add custom fields?</p> Common Trace Format - Bug #254 (New): No specified charset for metadata packets payloadhttps://bugs.lttng.org/issues/2542012-06-05T21:19:18ZPhilippe Proulxeeppeliteloop@gmail.com
<p>There's no current specified charset for the metadata packets payload. This can be problematic because if a tracer makes this Unicode and we read it thinking it's ASCII, the displayed text will be weird and the reading could even break.</p>
Possible solutions are:
<ul>
<li>lock it in the specifications</li>
</ul>
<blockquote>
<ul>
<li>UTF-8: ASCII-compatible, so this shouldn't make a big difference for most cases but allows for i18n of event names and so on</li>
<li>ASCII: simple, but only English</li>
<li>avoid everything else IMO</li>
</ul>
</blockquote>
<ul>
<li>add a charset byte within the metadata packet header (e.g. 0 => ASCII, 1 => UTF-8, etc.), but this breaks binary compatibility</li>
</ul>