Ardour Session Close Hangs

I am having a problem with Ardour failing to exit. After clicking on Quit in the Session menu, I get the ‘Butler drops pool trash’ message on the terminal window where Ardour was started, but the GUI completely hangs. I think the same problem arises when switching between projects or closing the current project. I presume Ardour is waiting for something to happen, which it never does. [I could try running Ardour under gdb and see if I can see where the threads are at this time.] Presing Control-C in the terminal window then makes it exit.

I believe it only happens when I have Generic Midi control surface enabled, which is nearly always. The same midi control ports for the Generic Midi input and output also have MMC routed to and from them and MTC out. These settings are in my standard template and established with the Midi Connections editing window.

Moreover, on restart/reload of the project, I note that the 'Preferences-Control Surfaces->Generic Midi settings windw has the ‘Edit the selected protocol (it must me enabled first)’ button greyed out, even though it is selected with a tick in the Generic Midi window above. Also the bindings to the Alsa Midi port are not being automatically recreated on restart/reload (ie they are neither working nor showing up in the Midi Connections editing window). These need to be manually repatched in that window on every reload of a project, but other midi connections are reloaded properly.

I will upgrade to the latest Ardour and try again.

Ardour6.2.0 (built using 6.2 and GCC version 6.3.0 20170516)
5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I have the same problem in Mixbus (7.2, happened in 7.1 too). It doesn’t always happen so it’s difficult to replicate to pin down the cause. It seems to only happen if the session has been open for a good length of time, but I’m not sure if that’s always the case.

I too get the “butler drops pool trash” in the terminal window and after that one or more “munlock” lines which are the final messages when the hang is happening.

I’m using a Generic MIDI control surface too. It’s a fairly recent addition, as are the hangs, but I don’t know if the hangs definitely started after the surface was added. I also started using some new plugins in that time, e.g. Guitarix and convolvers.

I have compiled Ardour and, with my fresh build, get that problem about 50 percent of the time. I should get time to track down the problem next week and then be able to suggest a patch or work around. It is certainly really annoying.

That is normal. it is printed every time a session is closed and resources are released.

This is not normal. What backend are you using? JACK? Does Ardour also stall if you do not use JACK?

This is not normal. What backend are you using? JACK? Does Ardour also stall if you do not use JACK?

JACK, yes. I will try with ALSA but I don’t know how quickly I’ll be able to reach any conclusion. I was trying to make the problem happen the other day - to see if not having the MIDI controller connected made any difference - and, naturally, I couldn’t get the problem to happen at all.

I may have tracked down the session close hang to a mutex operation in the generic midi control surface destructor code. The backtrace is below. This is with alsa on ardour 6.9. I’ll try to find the bug and then raise it on the bug tracker since that may be the more appropriate place.

DJG

#0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6
#1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0xec8bab8) at …/glib/gthread-posix.c:1340
#2 0x00000000007361b2 in Glib::Threads::Mutex::lock::Lock(Glib::Threads::Mutex&) (this=0x7fffffffa2f0, mutex=…)
at /usr/include/glibmm-2.4/glibmm/threads.h:687
#3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0xec8bab0, c=…) at libs/pbd/pbd/signals_generated.h:1118
#4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0xa0d51c0) at …/libs/pbd/pbd/signals.h:94
#5 0x0000000000736866 in PBD::ScopedConnection::disconnect() (this=0x9fe2af0) at …/libs/pbd/pbd/signals.h:153
#6 0x00007fffd3fb3c72 in MIDIControllable::midi_forget() (this=0x9fe29c0)
at …/libs/surfaces/generic_midi/midicontrollable.cc:109
#7 0x00007fffd3fb3cb2 in MIDIControllable::drop_external_control() (this=0x9fe29c0)
at …/libs/surfaces/generic_midi/midicontrollable.cc:117
#8 0x00007fffd3fb3b29 in MIDIControllable::~MIDIControllable() (this=0x9fe29c0, __in_chrg=)
at …/libs/surfaces/generic_midi/midicontrollable.cc:92
#9 0x00007fffd3fb3c0e in MIDIControllable::~MIDIControllable() (this=0x9fe29c0, __in_chrg=)
at …/libs/surfaces/generic_midi/midicontrollable.cc:93
#10 0x00007fffd3f6a966 in GenericMidiControlProtocol::drop_all() (this=0x55b4c70)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:326
#11 0x00007fffd3f69e21 in GenericMidiControlProtocol::~GenericMidiControlProtocol()
(this=0x55b4c70, __in_chrg=)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:231
#12 0x00007fffd3f6a032 in GenericMidiControlProtocol::~GenericMidiControlProtocol()
(this=0x55b4c70, __in_chrg=)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:233
#13 0x00007ffff7090cbf in ARDOUR::ControlProtocolManager::drop_protocols() (this=0x1c3e210)
at …/libs/ardour/control_protocol_manager.cc:204
#14 0x00007ffff7698f20 in ARDOUR::Session::destroy() (this=0x5744240) at …/libs/ardour/session.cc:612
#15 0x00007ffff7697e3c in ARDOUR::Session::~Session() (this=0x5744240, __in_chrg=)
at …/libs/ardour/session.cc:505

Which version of Arodur is that? The line number of some of the functions don’t match either 6.9.0 nor current git.

PS. would you mind moving the issue to tracker.ardour.org – the forum is not really suitable for bug reports and keep track of things.

Please send instructions on how to move it. …

Line numbers may have changed owing to my own local modifications.

I’m thinking the error might be a double acquire of the same _mutex that is not recursive in two successive calls. But this would not deadlock randomly, but consistently. I can’t work out whether the mutex is shared or not since I don’t yet understand what the base class inheritance that defines _mutex is in the following definition at line 80 of signals.h (the boost class is new to me)

class LIBPBD_API Connection : public boost::enable_shared_from_this

#3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0xec8bab0, c=…) at l
ibs/pbd/pbd/signals_generated.h:1118 // Second lock on the same mutex?

#4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0xa0d51c0) at …/libs/pbd/pbd/signals.h:94 locks on _lm first

“move to tracker.ardour.org” means “file a bug report there”

Also, useful backtraces in multithreaded apps nearly always require: thread apply all bt rather than just a trace from one thread, so if you could add the output of that, it would be helpful.

OK - will do sometime soon.

As you say, I need to report which thread is holding the mutex (it’s not two acquires by the same thread which thought earlier).

Thanks

The session exit deadlock appears to be the classic chaotic taking out of locks deadlock situation, as in many textbooks and my own lecture notes Department of Computer Science and Technology – Course pages 2021–22: Concurrent and Distributed Systems

Thread 1 takes the Connection lock and then the Signal2 lock
Thread 72 takes the Signal2 lock and then the Connection lock.

I can’t seem to attach a full backtrace listing to this posting so I’ve put it here: http://koo.corpus.cam.ac.uk/privote1/trc2.txt

Thread 1 (Thread 0x7ffff038ae80 (LWP 13568)):
#0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6
#1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0x9479ae0) at …/glib/gthread-posix.c:1340
#2 0x00000000007361b2 in Glib::Threads::Mutex::lock::Lock(Glib::Threads::Mutex&) (this=0x7fffffff9940, mutex=…) at /usr/include/glibmm-2.4/glibmm/threads.h:687
#3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0x9479ad8, c=…) at libs/pbd/pbd/signals_generated.h:1118
#4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0x48c0150) at …/libs/pbd/pbd/signals.h:96
#5 0x0000000000736866 in PBD::ScopedConnection::disconnect() (this=0x48bfff0) at …/libs/pbd/pbd/signals.h:155
#6 0x00007fffd3fb3e32 in MIDIControllable::midi_forget() (this=0x48bfec0) at …/libs/surfaces/generic_midi/midicontrollable.cc:109

Thread 72 (Thread 0x7fffa2cdd700 (LWP 13651)):
#0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6
#1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0x48c0160) at …/glib/gthread-posix.c:1340
#2 0x00000000007361b2 in Glib::Threads::Mutex::lock::Lock(Glib::Threads::Mutex&) (this=0x7fffa2cdc8c0, mutex=…) at /usr/include/glibmm-2.4/glibmm/threads.h:687
#3 0x0000000000769521 in PBD::Connection::signal_going_away() (this=0x48c0150) at …/libs/pbd/pbd/signals.h:110
#4 0x00007ffff5f4aef0 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::~Signal2() (this=0x9479ad8, __in_chrg=) at libs/pbd/pbd/signals_generated.h:966

Processing: trc2.txt…

2 Likes

indeed. Nice detective work!

One way out of this is: Fix race condition when ~Signal and ~ScopedConnection run concurrently · Ardour/ardour@ceffc83 · GitHub

Although this specific issue is likely better addressed via Stop auto-connect thread before terminating ctrl surfaces · Ardour/ardour@02f2d76 · GitHub

Thanks for looking in to it.

I can see stopping the auto-connect mechanism in advance may help avoid the deadlock being entered via the route I was hitting.

The signal.h change moves the mutex acquire into the conditon, so surely the deadlock can still happen if that condition holds? Perhaps you have an argument as to why it won’t. Also, you’ve replaced a mutex acquire with a loop that performs try-acquire until it succeeds, but with a 100 us between attempts. This has no semantic difference if I read it correctly, but I’m not familiar with these particular C++ primitives. Your change could benefit a co-routine-based concurrency implementation where the mutex acquire does not enter the threads package and simply spins, but most futex implementations will already implement what you’ve coded by spinning a few times before yielding to the scheduller, which is exactly what was indicated in the backtraces I had, where the thread was blocked in the syscall.

Anyway, I’ve applied both patches to my build and so far its not hung over multiple session closings this morning…

Thanks a lot.

Indeed the solution was half-baked. It took me a few iterations to get things right. It is now merged into ardour git master branch.

Thank you very much for the feedback and tracking the issue down in the first place!