Project

General

Profile

Bug #937

Babeltrace failure in live mode

Added by Li Liguang about 5 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
09/10/2015
Due date:
% Done:

0%

Estimated time:

Description

When using LTTng in live mode, with the Babeltrace tool, the output of Babeltrace stalls after a while. The tracing continues, but the output is no longer seen from Babeltrace.The scenario seems to happen mostly under heavy loaded tracing.

lttng version is :
lttng version 2.7.0-pre - Gaia -v2.6.0-rc1-242-g60f7035.

When the failure occurs Babeltrace outputs the following message:

[error] Unexpected end of packet. Either the trace data stream is corrupted or metadata description does not match data layout.
[error] Reading event failed.
[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/localhost/dcsession.
[warning] [Context] cannot open trace "net://localhost/host/localhost/dcsession" for reading.
[error] opening trace "net://localhost/host/localhost/dcsession" for reading.

[error] none of the specified trace paths could be opened.

Sometimes Babeltrace outputs the following message:

[error] get_next_index: error
[error] get_next_index failed

The scenario can easily be reproduces as follows:
1. lttng-relayd d -o /tmp/lttng/live
2.lttng-sessiond --no-kernel --daemonize
3.lttng -n create session --live 1000000 -U net://localhost/
4.lttng enable-channel channel0 -u --buffers-uid -C 1024k -W 1
5.lttng enable-event --userspace --channel channel0 -a
6.lttng start
7../hello ---
>(demo application)
8.babeltrace --clock-date --no-delta -i lttng-live net://localhost/host/localhost/session

This is my demo application:

1 #include <stdio.h>
2 #include "hello-tp.h"
3
4 void test_bp(void)
5 {
6 printf("\r\ncome in bp function for testing\r\n");
7 return;
8 }
9
10 int main(int argc, char* argv[])
11 {
12 int x;
13
14 puts("Hello, World!\nPress Enter to continue...");
15
16 /* The following getchar() call is only placed here for the purpose
17 * of this demonstration, for pausing the application in order for
18 * you to have time to list its events. It's not needed otherwise.
19 /
20 getchar();
21
22 /
A tracepoint() call. Arguments, as defined in hello-tp.tp:
23 *
24 * 1st: provider name (always)
25 * 2nd: tracepoint name (always)
26 * 3rd: my_integer_arg (first user-defined argument)
27 * 4th: my_string_arg (second user-defined argument)
28 *
29 * Notice the provider and tracepoint names are NOT strings;
30 * they are in fact parts of variables created by macros in
31 * hello-tp.h.
32 */
33 for(;;)
34 {
35 // sleep(2);
36 tracepoint(hello_world, my_first_tracepoint, 23, "hi there!");
37
38 for (x = 0; x < argc; ++x) {
39 tracepoint(hello_world, my_first_tracepoint, x, argv[x]);
40 }
41 // test_bp();
42 }
43 puts("Quitting now!");
44
45 // test_bp();
46 tracepoint(hello_world, my_first_tracepoint, x * x, "x^2");
47
48
49
50 return 0;
51 }
#1

Updated by Li Liguang about 5 years ago

Li Liguang wrote:

When using LTTng in live mode, with the Babeltrace tool, the output of Babeltrace stalls after a while. The tracing continues, but the output is no longer seen from Babeltrace.The scenario seems to happen mostly under heavy loaded tracing.

lttng version is :
lttng version 2.7.0-pre - Gaia -v2.6.0-rc1-242-g60f7035.

When the failure occurs Babeltrace outputs the following message:

[error] Unexpected end of packet. Either the trace data stream is corrupted or metadata description does not match data layout.
[error] Reading event failed.
[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost/host/localhost/dcsession.
[warning] [Context] cannot open trace "net://localhost/host/localhost/dcsession" for reading.
[error] opening trace "net://localhost/host/localhost/dcsession" for reading.

[error] none of the specified trace paths could be opened.

Sometimes Babeltrace outputs the following message:

[error] get_next_index: error
[error] get_next_index failed

The scenario can easily be reproduces as follows:
1. lttng-relayd -d -o /tmp/lttng/live
2. lttng-sessiond --no-kernel --daemonize
3. lttng -n create session --live 1000000 -U net://localhost/
4. lttng enable-channel channel0 -u --buffers-uid -C 1024k -W 1
5. lttng enable-event --userspace --channel channel0 -a
6. lttng start
7. ./hello (demo application)
8. babeltrace --clock-date --no-delta -i lttng-live net://localhost/host/localhost/session

This is my demo application:

1 #include <stdio.h>
2 #include "hello-tp.h"
3
4 void test_bp(void)
5 {
6 printf("\r\ncome in bp function for testing\r\n");
7 return;
8 }
9
10 int main(int argc, char* argv[])
11 {
12 int x;
13
14 puts("Hello, World!\nPress Enter to continue...");
15
16 /* The following getchar() call is only placed here for the purpose
17 * of this demonstration, for pausing the application in order for
18 * you to have time to list its events. It's not needed otherwise.
19 /
20 getchar();
21
22 /
A tracepoint() call. Arguments, as defined in hello-tp.tp:
23 *
24 * 1st: provider name (always)
25 * 2nd: tracepoint name (always)
26 * 3rd: my_integer_arg (first user-defined argument)
27 * 4th: my_string_arg (second user-defined argument)
28 *
29 * Notice the provider and tracepoint names are NOT strings;
30 * they are in fact parts of variables created by macros in
31 * hello-tp.h.
32 */
33 for(;;)
34 {
35 // sleep(2);
36 tracepoint(hello_world, my_first_tracepoint, 23, "hi there!");
37
38 for (x = 0; x < argc; ++x) {
39 tracepoint(hello_world, my_first_tracepoint, x, argv[x]);
40 }
41 // test_bp();
42 }
43 puts("Quitting now!");
44
45 // test_bp();
46 tracepoint(hello_world, my_first_tracepoint, x * x, "x^2");
47
48
49
50 return 0;
51 }

#2

Updated by Li Liguang about 5 years ago

Today i update the lttng-tools and lttng-ust to the latest version, try to reproduce this defect but another error is reported as following, hope this log is helpful.

[14:46:54.848671885] (+0.000000234) pek-fjia-d1 hello_world:my_first_tracepoint: { cpu_id = 1 }, { my_string_field = "./hello_bab0", my_integer_field = 0 }
[14:46:54.848672122] (+0.000000237) pek-fjia-d1 hello_world:my_first_tracepoint: { cpu_id = 1 }, { my_string_field = "hi there!", my_integer_field = 23 }
[error] Remote side has closed connection
[error] get_next_index failed
[error] Error sending get_next_index request: Broken pipe
[error] get_next_index failed
[error] Error sending cmd: Broken pipe
[error] get_next_index failed

Version
Lttng-tools: v2.7.0-rc1-197-gda412cd
Lttng-ust: v2.7.0-rc2.

Also available in: Atom PDF