Bug #401
closedconsumerd crashed and coredumped when deactivating sessions tracing on same domain
0%
Description
The crash seems to happen during deactivation. The test case that reproduces the fault activates two sessions with identical input, a subdomain with no event:
com_ericsson_cba_trace_testapp_lowtraf,TRACE_DEBUG
The two sessions are created/activated one after the other, allowed to run for 10 seconds, and then both are deactivated.
The crash has happened in different nodes, but I currently don't have a good log with sessiond/consumerd in verbose mode. If you have trouble identifying the fault from the coredump, I can try to reproduce it again.
The coredumps are attached:rw------ 1 root root 37490688 Nov 21 22:04 lttng-consumerd.30064.PL-3.corerw------ 1 root root 38068224 Nov 21 22:04 lttng-consumerd.29456.SC-2.corerw------ 1 root root 37412864 Nov 21 22:04 lttng-consumerd.2142.SC-1.corerw------ 1 root root 36896768 Nov 21 23:36 lttng-consumerd.9066.PL-4.corerw------ 1 root root 36433920 Nov 21 23:36 lttng-consumerd.26780.SC-2.corerw------ 1 root root 36581376 Nov 22 09:45 lttng-consumerd.18468.PL-4.core
Files
Updated by Jesus Garcia about 12 years ago
Sorry, forgot the build info:
CURRENT HEAD: foss/babeltrace 52105e8 (HEAD, origin/master, origin/HEAD) Adding a test which do a sequence of seek BEGIN, LAST, BEGIN, LAST
CURRENT HEAD: foss/lttng-tools 76f66f6 (HEAD, origin/master, origin/HEAD) Fix: Typo from a previous patch in an assert()
CURRENT HEAD: foss/lttng-ust d646ca6 (HEAD, origin/master, origin/HEAD) document that tracepoint names should ideally not be re-used
CURRENT HEAD: foss/userspace-rcu 852a17a (HEAD, origin/master, origin/HEAD) wfcqueue: Fix lock and unlock functions
Updated by David Goulet about 12 years ago
- Status changed from New to Feedback
- Assignee set to David Goulet
- Target version changed from 2.0.x stable to 2.1 stable
Same as #400. Retest please with rc8 (ideally git HEAD :)
Thanks!
Updated by Jesus Garcia about 12 years ago
- File lttng-consumerd.13937.PL-3.core.gz lttng-consumerd.13937.PL-3.core.gz added
- File lttng-consumerd.1489.SC-1.core.gz lttng-consumerd.1489.SC-1.core.gz added
- File lttng-consumerd.23611.SC-2.core.gz lttng-consumerd.23611.SC-2.core.gz added
- File lttng-consumerd.24086.PL-4.core.gz lttng-consumerd.24086.PL-4.core.gz added
Hi David,
This one still happens. I'm attaching more coredumps.
Regards,
Jesus
Updated by David Goulet about 12 years ago
- Status changed from Feedback to New
Hi Jesus,
Can you give me your full use case meaning the commands you use to trigger this problem and flow of operations?
Unfortunately, I can't load the core dumps on my arch here so I can not really use them.
Thanks!
Updated by Jesus Garcia about 12 years ago
Hi David,
The scenario is as follows:
Activate a session with a domain (in this case it is a domain in our test app: com_ericsson_cba_trace_testapp_lowtraf). We specify a log level that covers all the tracepoints in the domain. We use LOGLEVEL_RANGE.
This domain contains four tracepoints:
com_ericsson_cba_trace_testapp_lowtraf:OnePerSecA
com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB
com_ericsson_cba_trace_testapp_lowtraf:TenPerSecA
com_ericsson_cba_trace_testapp_lowtraf:TenPerSecB
Right after the session is active, a second session is activated with the exact same domain as above.
The sessions are allowed to run for 10 seconds and then they are deactivated (we wait until the first one is fully stopped before we deactivate the second one).
Please let me know if this is enough info.
Regards,
Jesus
Updated by David Goulet about 12 years ago
Hmmm... I might need more info or at least the logs from the sessiond/consumerd.
So the scenario is two sessions with the same inputs. Here is what I understand:
(I have an application running and hitting tracepoints in loop)
$ lttng create s1
$ lttng enable-event -s s1 -a -u
$ lttng start s1
$ lttng create s2
$ lttng enable-event -s s2 -a -u
$ lttng start s2
$ sleep 10
$ lttng stop s1
$ lttng stop s2
With this use case, I'm unable to reproduce the issue. Is this correct?
Thanks!
Updated by Jesus Garcia about 12 years ago
Hi David,
Unfortunately, I'm not able to reproduce the problem with verbose mode on (when it's off, I see it consistently), but to be more specific, this is what the enabled events look like for my session:
Events:
com_ericsson_cba_trace_testapp_lowtraf_* (loglevel: TRACE_DEBUG (14)) (type: tracepoint) [enabled]
com_ericsson_cba_trace_testapp_lowtraf:* (loglevel: TRACE_DEBUG (14)) (type: tracepoint) [enabled]
So, don't use just an event, but rather a provider with both <provider_name>_* and <provider_name>:*
I will disconnect for a couple of hours and continue from home.
Regards,
Jesus
Updated by David Goulet about 12 years ago
I've tried the exact same event but I'm really unable to reproduce anything....
Since I'm unable to load your coredump since we have different arch/OS, can you load them in gdb and pastebin me the output of:
gdb> bt full
The gdb backtrace is a start and will give me a place to begin with.
Big thanks!
Updated by Jesus Garcia about 12 years ago
Hi David,
Here is the bt:
(gdb) bt full
#0 0x00007fe85f3dfb55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fe85f3e1131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fe85f3d8a10 in assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3 0x0000000000407188 in consumer_thread_metadata_poll (data=0x620a50) at consumer.c:2105
ret = <optimized out>
i = 0
pollfd = <optimized out>
revents = 1
nb_fd = 5
stream = 0x7fe858092640
iter = {iter = {node = 0x0, next = 0x0}}
node = 0x0
events = {epfd = 17, nb_fd = 5, events_size = 16, events = 0x62eeb0}
len = <optimized out>
__func = "consumer_thread_metadata_poll"
PRETTY_FUNCTION = "consumer_thread_metadata_poll"
#4 0x00007fe85f7287b6 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5 0x00007fe85f484c6d in clone () from /lib64/libc.so.6
No symbol table info available.
#6 0x0000000000000000 in ?? ()
No symbol table info available.
Regards,
Jesus
Updated by Jesus Garcia about 12 years ago
Hi David,
Were you able to get any further with this info? Please let me know if you need any other information to nail down this bug.
Thanks,
Jesus
-----Original Message-----
From: bugs@lttng.org [mailto:bugs@lttng.org]
Sent: November-28-12 3:24 PM
Subject: [LTTng Tools - Bug #401] consumerd crashed and coredumped when deactivating sessions tracing on same domain
Issue #401 has been updated by Jesus Garcia.
Hi David,
Here is the bt:
(gdb) bt full
#0 0x00007fe85f3dfb55 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fe85f3e1131 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007fe85f3d8a10 in assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3 0x0000000000407188 in consumer_thread_metadata_poll (data=0x620a50) at consumer.c:2105
ret = <optimized out>
i = 0
pollfd = <optimized out>
revents = 1
nb_fd = 5
stream = 0x7fe858092640
iter = {iter = {node = 0x0, next = 0x0}}
node = 0x0
events = {epfd = 17, nb_fd = 5, events_size = 16, events = 0x62eeb0}
len = <optimized out>
__func = "consumer_thread_metadata_poll"
PRETTY_FUNCTION = "consumer_thread_metadata_poll"
#4 0x00007fe85f7287b6 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5 0x00007fe85f484c6d in clone () from /lib64/libc.so.6
No symbol table info available.
#6 0x0000000000000000 in ?? ()
No symbol table info available.
Regards,
Jesus
----------------------------------------
Bug #401: consumerd crashed and coredumped when deactivating sessions tracing on same domain
https://bugs.lttng.org/issues/401#change-939
Author: Jesus Garcia
Priority: Normal
Category:
The crash seems to happen during deactivation. The test case that reproduces the fault activates two sessions with identical input, a subdomain with no event:
com_ericsson_cba_trace_testapp_lowtraf,TRACE_DEBUG
The two sessions are created/activated one after the other, allowed to run for 10 seconds, and then both are deactivated.
The crash has happened in different nodes, but I currently don't have a good log with sessiond/consumerd in verbose mode. If you have trouble identifying the fault from the coredump, I can try to reproduce it again.
The coredumps are attached:rw------ 1 root root 37490688 Nov 21 22:04 lttng-consumerd.30064.PL-3.corerw------ 1 root root 38068224 Nov 21 22:04 lttng-consumerd.29456.SC-2.corerw------ 1 root root 37412864 Nov 21 22:04 lttng-consumerd.2142.SC-1.corerw------ 1 root root 36896768 Nov 21 23:36 lttng-consumerd.9066.PL-4.corerw------ 1 root root 36433920 Nov 21 23:36 lttng-consumerd.26780.SC-2.corerw------ 1 root root 36581376 Nov 22 09:45 lttng-consumerd.18468.PL-4.core
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://bugs.lttng.org/my/account
Updated by David Goulet about 12 years ago
- Status changed from New to Confirmed
Ooooook this is NOT suppose to happen. An assert on the metadata hash table means that either the add or delete was not completed at that stage.
This is an unsynchronized state between the poll set and the metadata hash table which is BAD.
Once I have a patch, I'll add it to this issue for testing.
Thanks!
Updated by David Goulet about 12 years ago
- Status changed from Confirmed to Feedback
We need more information on this bug.
First, I really need the exact commands you used to reproduce that.
Does the crash happens on the "lttng stop" or "lttng destroy" ?
The state of the application when the crash occurs. Was the app stopped, killed, .. ?
Using the verbose mode, you don't trigger the issue so it is probably a race condition which are VERY difficult to find :) so the more you can give me, the better.
Thanks!
Updated by Jesus Garcia about 12 years ago
Hi David,
Here is the order in which events are taking place:lttng_create_session
lttng_create_handle
lttng_channel_set_default_attr
lttng_enable_channel
lttng_enable_event_with_filter (twice)
lttng_start_tracing
<repeat commands above for the 2nd session>
<wait 10 sec>
lttng_stop_tracing (1st session)
lttng_destroy_session (1st session)
<wait 5 sec>
lttng_stop_tracing (2nd session)
- crash happens ***
lttng_destroy_session (2nd session)
The crash happens after issuing lttng_stop_tracing for the 2nd session
The apps are never stopped during the test case.
Here are select lines from our saflog to show the times at which the stop and destroy are invoked for both sessions:
17:10:23 12/03/2012 IN TraceP_SC-1 "DBG: LTTNG session stop tracing requested, session name TRACact0161_1
17:10:25 12/03/2012 IN TraceP_SC-1 "DBG: LTTNG session destroy requested, session name TRACact0161_1
17:10:38 12/03/2012 IN TraceP_SC-1 "DBG: LTTNG session stop tracing requested, session name TRACact0162_1
17:10:40 12/03/2012 IN TraceP_SC-1 "DBG: LTTNG session destroy requested, session name TRACact0162_1
Here you can see the time stamp on the core dumps:
rw------ 1 root root 35082240 2012-12-03 17:10:38.000000000 0500 lttng-consumerd.7730.SC-2.core 1 root root 35147776 2012-12-03 17:10:38.000000000
-rw------0500 lttng-consumerd.4037.SC-1.core 1 root root 35209216 2012-12-03 17:10:38.000000000
-rw------0500 lttng-consumerd.32713.PL-4.core 1 root root 35364864 2012-12-03 17:10:38.000000000 -0500 lttng-consumerd.24890.PL-3.core
-rw------
Note: I was originally deactivating the 2nd session right after the 1st one.
I added a 2 second delay between stop and destroy plus a 5 second delay before deactivating the 2nd session to see which event the crash was closer to.
Please let me know if this helps narrow the fault down.
Regards,
Jesus
Updated by David Goulet about 12 years ago
That helps. Can't reproduce it but we can try a couple of things.
The crash happens during the stop command which uses the data pending feature. I suspect you might trigger a small window where something is async between the metadata poll thread and the tracer.... One easy way to find out is to use the --no-wait when stopping the session (lttng_stop_tracing_no_wait).
Also, ideally, if you can attach the application you are using to test so I can get the exact same tools/ust conditions.
Updated by David Goulet about 12 years ago
and how long does it take before you can reproduce this?
Running X experiments, how many trigger the issue?
Updated by Jesus Garcia about 12 years ago
- File TestApp_type1_src.zip TestApp_type1_src.zip added
Hi David,
From the point of view of the logic in our code, we can't use the --no-wait since we need to make sure we get all the data for each session. However, we could try just for troubleshooting purposes and to help you find the fault. Please let me know.
Here attached are the files for the test app I'm using in this particular scenario. I think they have already been included by Tan in bug 406.
I can reproduce the issue consistently. The scenario I have described previously is actually one of our oldest test cases. We have other test cases that trigger the fault as well, all involving more than one session.
Thanks,
Jesus
Updated by David Goulet about 12 years ago
- File 0002-Fix-don-t-steal-key-when-adding-a-metadata-stream.patch 0002-Fix-don-t-steal-key-when-adding-a-metadata-stream.patch added
- Status changed from Feedback to Confirmed
OK, my and Mathieu did an intense debugging session and we think we nailed the problem.
Here is a patch to apply on HEAD for this particular issue. This command should do the trick.
git am 0002-Fix-don-t-steal-key-when-adding-a-metadata-stream.patch
Updated by Jesus Garcia about 12 years ago
This patch seems to introduce other issues. I get a consumerd crash just by starting one session and then consumerd stays defunct.
I used the lttng cli, so it should be easy to reproduce on your side.
Here are the printouts:
SC-1:~ # date; ps eaf | grep lttng | grep -v grep
Wed Dec 5 12:32:21 EST 2012
root 2423 1 0 12:30 ? 00:00:00 lttng-sessiond -d
SC-1:~ # date;lttng create ses
Wed Dec 5 12:32:26 EST 2012
Session ses created.
Traces will be written in /root/lttng-traces/ses-20121205-123226
SC-1:~ # date;lttng list -u
Wed Dec 5 12:32:31 EST 2012
UST events:
------------
PID: 29756 - Name: /home/test_apps/TestApp_type1
com_ericsson_cba_trace_testapp_hightraf:TenKiloPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_trace_testapp_hightraf:KiloPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_trace_testapp_hightraf:HundredPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_trace_testapp_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_trace_testapp_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_trace_testapp_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_trace_testapp_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba:AfterAllLoops (loglevel: TRACE_ERR (3)) (type: tracepoint)
com_ericsson_cba:cba_domain_event (loglevel: TRACE_ERR (3)) (type: tracepoint)
com_ericsson_cba_trace:trace_domain_event (loglevel: TRACE_WARNING (4)) (type: tracepoint)
com_ericsson_cba_trace_testapp:pointer_to_structure_in_tracepoint (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_trace_testapp:testapp_domain_event (loglevel: TRACE_NOTICE (5)) (type: tracepoint)
PID: 29757 - Name: /home/test_apps/TestApp_Fork1
com_ericsson_cba:cba_domain_event (loglevel: TRACE_ERR (3)) (type: tracepoint)
com_ericsson_cba_trace:trace_domain_event (loglevel: TRACE_WARNING (4)) (type: tracepoint)
com_ericsson_cba_trace_testapp:testapp_domain_event (loglevel: TRACE_NOTICE (5)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_hightraf:KiloPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_hightraf:HundredPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_hightraf:KiloPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_hightraf:HundredPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
PID: 29763 - Name: /home/test_apps/TestApp_Fork1
com_ericsson_cba:cba_domain_event (loglevel: TRACE_ERR (3)) (type: tracepoint)
com_ericsson_cba_trace:trace_domain_event (loglevel: TRACE_WARNING (4)) (type: tracepoint)
com_ericsson_cba_trace_testapp:testapp_domain_event (loglevel: TRACE_NOTICE (5)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_hightraf:KiloPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_testapp_fork_child_hightraf:HundredPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_hightraf:KiloPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_testapp_fork_parent_hightraf:HundredPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
PID: 29758 - Name: /home/test_apps/TestApp_Thread1
com_ericsson_cba:cba_domain_event (loglevel: TRACE_ERR (3)) (type: tracepoint)
com_ericsson_cba_trace:trace_domain_event (loglevel: TRACE_WARNING (4)) (type: tracepoint)
com_ericsson_cba_trace_testapp:testapp_domain_event (loglevel: TRACE_NOTICE (5)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_hightraf:KiloPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread1_hightraf:HundredPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_lowtraf:TenPerSecB (loglevel: TRACE_DEBUG_UNIT (11)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_lowtraf:TenPerSecA (loglevel: TRACE_DEBUG_MODULE (10)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_lowtraf:OnePerSecB (loglevel: TRACE_DEBUG_PROCESS (9)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_lowtraf:OnePerSecA (loglevel: TRACE_DEBUG_PROGRAM (8)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_hightraf:KiloPerSec (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread2_hightraf:HundredPerSec (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
com_ericsson_cba_trace_testapp_thread_main:OnePerSecA (loglevel: TRACE_ERR (3)) (type: tracepoint)
SC-1:~ # date;lttng enable-event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecA -u
Wed Dec 5 12:32:36 EST 2012
UST event com_ericsson_cba_trace_testapp_lowtraf:OnePerSecA created in channel channel0
SC-1:~ #
SC-1:~ # date;lttng list
Wed Dec 5 12:32:45 EST 2012
Available tracing sessions:
1) ses (/root/lttng-traces/ses-20121205-123226) [inactive]
Use lttng list <session_name> for more details
SC-1:~ # date;lttng start
Wed Dec 5 12:33:08 EST 2012
Tracing started for session ses
SC-1:~ # date;lttng list
Wed Dec 5 12:33:18 EST 2012
Available tracing sessions:
1) ses (/root/lttng-traces/ses-20121205-123226) [active]
Use lttng list <session_name> for more details
SC-1:~ #
SC-1:~ #
SC-1:~ # date;lttng stop;date
Wed Dec 5 12:33:26 EST 2012
Waiting for data availability
Tracing stopped for session ses
Wed Dec 5 12:33:26 EST 2012
SC-1:~ #
SC-1:~ #
SC-1:~ # date; ps eaf | grep lttng | grep -v grep 1 root root 25706496 2012-12-05 12:33:08.000000000 -0500 lttng-consumerd.2594.SC-1.core
Wed Dec 5 12:33:35 EST 2012
root 2423 1 0 12:30 ? 00:00:00 lttng-sessiond -d
root 2594 2423 0 12:32 ? 00:00:00 [lttng-consumerd] <defunct>
SC-1:~ #
SC-1:~ #
SC-1:~ # ls -ltr --full-time /cluster/dumps/ | grep "2012-12-05"
<ommitted some lines since the related coredump is below>
-rw------
Updated by David Goulet about 12 years ago
Hmmm, so there was two critical patches yesterday for this bug and #405
(reported by Amer). A defunct consumer could be the deadlock problem we
faced in the other bug.
Can you apply both patches and check ?
Big Thanks!
Updated by Jesus Garcia about 12 years ago
I applied both patches, but I get the same failure as before (consumerd crashes at "lttng start").
Could you please check on your side and let me know if it works?
Thanks,
Jesus
Updated by David Goulet about 12 years ago
- File 0001-Fix-don-t-steal-key-when-adding-a-metadata-stream.patch 0001-Fix-don-t-steal-key-when-adding-a-metadata-stream.patch added
Bad assert got in..... :(
Please note that due to sleep deprivation yesterday, I did the make check on the bug405 branch in my dev tree, missing this patch to be tested..... and even then I got the 405 problem with the patch....
I'm really sorry. Now, with much more sleep, here is a patch that passes through the lttng-tool make check :) and does NOT assert on a valid node ;)
Sorry Jesus for any inconvenience. I had this patch reviewed as well.
Lesson learned, no patch after dark :)
Updated by David Goulet about 12 years ago
- Status changed from Confirmed to Resolved