Project

General

Profile

Actions

Bug #1217

closed

regression/tools/live intermitent failure on yocto with 2.11

Added by Jonathan Rajotte Julien over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Francis Deslauriers
Target version:
Start date:
01/24/2020
Due date:
% Done:

0%

Estimated time:

Description

While upgrading to 2.11 Alexander Kanavin observe intermittent failure for the live test.

With those issues addressed (patch is coming), I am still having two
failures:

#     Failed test
(../../../../../lttng-tools-2.11.0/tests/regression/tools/live/live_test.c:main()
at line 707)
# Got first packet index with offset 0 and len 4096
not ok 6 - Get metadata, received 0 bytes
FAIL: tools/live/test_ust_tracefile_count 6 - Get metadata, received 0 bytes

# Got first packet index with offset 0 and len 4096
not ok 6 - Get metadata, received 0 bytes
FAIL: tools/live/test_ust 6 - Get metadata, received 0 bytes

What's weird is that sometimes they pass. Could there be a race or some
timing issue in the test?

Step to reproduce:

git clone https://github.com/PSRCode/poky-contrib/tree/live-test-failure-poky

then setup a default build (with qemu x86_64 as target), 'bitbake
core-image-sato-ptest', then 'runqemu kvm nographic'.

Then log in as root (no password), change to /usr/lib/lttng-tools/ptest,
and issue ./run-ptest. Then you should be able to see the failure.

To exit qemu: ctrl-a x.

Actions #1

Updated by Alexander Kanavin about 3 years ago

Jonathan Rajotte Julien wrote:

While upgrading to 2.11 Alexander Kanavin observe intermittent failure for the live test.

[...]

Step to reproduce:

[...]

I've had a closer look at this, and this patch is the outcome:

https://lists.openembedded.org/g/openembedded-core/message/155711

Or, specifically, this part, which inserts explicit sleeps where lack of those results in sporadic failures (it's not observed in from-source-tree tests runs, as those execute binaries via libtool wrappers, adding delay implicitly):

From 8d9daede0882d239b0a47b0f7a6db68ba4934a7d Mon Sep 17 00:00:00 2001
From: Alexander Kanavin <>
Date: Sat, 4 Sep 2021 13:57:39 +0200
Subject: [PATCH] tests: wait some more before analysing traces or starting
tracing

Otherwise, there are sporadic race failures where lttng tracing
is stopped before all expected events are collected or is started too soon, e.g.:

PASS: tools/tracker/test_event_tracker 205 - Traced application stopped.
PASS: tools/tracker/test_event_tracker 206 - Stop lttng tracing for session
PASS: tools/tracker/test_event_tracker 207 - Destroy session tracker
FAIL: tools/tracker/test_event_tracker 208 - Validate empty trace

PASS: ust/namespaces/test_ns_contexts_change 42 - Stop lttng tracing for session mnt_ns
PASS: ust/namespaces/test_ns_contexts_change 43 - Destroy session mnt_ns
PASS: ust/namespaces/test_ns_contexts_change 44 - Wait after kill session daemon
PASS: ust/namespaces/test_ns_contexts_change 45 - Validate trace for event mnt_ns = 4026531840, 1000 events
PASS: ust/namespaces/test_ns_contexts_change 46 - Read a total of 1000 events, expected 1000
PASS: ust/namespaces/test_ns_contexts_change 47 - Validate trace for event mnt_ns = 4026532303, 233 events
FAIL: ust/namespaces/test_ns_contexts_change 48 - Read a total of 233 events, expected 1000

Of course, patching sleeps in is not a real fix, syncs need to be done explicitly and reliably, but I don't know how.

Actions #2

Updated by Jonathan Rajotte Julien about 3 years ago

  • Status changed from New to In Progress
  • Assignee changed from Jonathan Rajotte Julien to Francis Deslauriers

Francis should have some time this week to go over this an hopefully we will have a more stable test suite.

Actions #3

Updated by Jonathan Rajotte Julien about 3 years ago

Hi,

We have been able to reproduce the problem seen in test_event_tracker and test_ns_contexts_change.

Both seems to be related to app sync and validate races.

Francis is on it.

Considering that the proposed change to OE seems to fix a lot of thing that could have resulted to the initial problem in live I guess we will have evaluate the flakiness of the initial live test once OE is upgrade to 2.13.

Cheers

Actions #4

Updated by Francis Deslauriers about 3 years ago

We are currently testing fixes for both of these issues. I will this issue updated.

Actions #5

Updated by Francis Deslauriers about 3 years ago

The fixes for those 2 issues (test_event_tracker and test_ns_contexts_changes) were backported to the 2.12 and 2.13 stable branches.
https://github.com/lttng/lttng-tools/commit/d15d1d730a972e053d306db7594973f9db7bb521
https://github.com/lttng/lttng-tools/commit/f99d029f8a43359a3cc31fc129cc6e3eb90afc2b

Actions #6

Updated by Francis Deslauriers about 3 years ago

  • Status changed from In Progress to Resolved

I'm closing this issue as resolved since the original issue was related to 2.11.
You seem to be now targeting the 2.13 branch, if you see flakiness again in the 2.13 lttng-live tests please open new bug on this tracker.

Actions

Also available in: Atom PDF