I am having a problem with Ardour failing to exit. After clicking on Quit in the Session menu, I get the ‘Butler drops pool trash’ message on the terminal window where Ardour was started, but the GUI completely hangs. I think the same problem arises when switching between projects or closing the current project. I presume Ardour is waiting for something to happen, which it never does. [I could try running Ardour under gdb and see if I can see where the threads are at this time.] Presing Control-C in the terminal window then makes it exit.
I believe it only happens when I have Generic Midi control surface enabled, which is nearly always. The same midi control ports for the Generic Midi input and output also have MMC routed to and from them and MTC out. These settings are in my standard template and established with the Midi Connections editing window.
Moreover, on restart/reload of the project, I note that the 'Preferences-Control Surfaces->Generic Midi settings windw has the ‘Edit the selected protocol (it must me enabled first)’ button greyed out, even though it is selected with a tick in the Generic Midi window above. Also the bindings to the Alsa Midi port are not being automatically recreated on restart/reload (ie they are neither working nor showing up in the Midi Connections editing window). These need to be manually repatched in that window on every reload of a project, but other midi connections are reloaded properly.
I will upgrade to the latest Ardour and try again.
Ardour6.2.0 (built using 6.2 and GCC version 6.3.0 20170516)
5.5.17-200.fc31.x86_64 #1 SMP Mon Apr 13 15:29:42 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I have the same problem in Mixbus (7.2, happened in 7.1 too). It doesn’t always happen so it’s difficult to replicate to pin down the cause. It seems to only happen if the session has been open for a good length of time, but I’m not sure if that’s always the case.
I too get the “butler drops pool trash” in the terminal window and after that one or more “munlock” lines which are the final messages when the hang is happening.
I’m using a Generic MIDI control surface too. It’s a fairly recent addition, as are the hangs, but I don’t know if the hangs definitely started after the surface was added. I also started using some new plugins in that time, e.g. Guitarix and convolvers.
I have compiled Ardour and, with my fresh build, get that problem about 50 percent of the time. I should get time to track down the problem next week and then be able to suggest a patch or work around. It is certainly really annoying.
This is not normal. What backend are you using? JACK? Does Ardour also stall if you do not use JACK?
JACK, yes. I will try with ALSA but I don’t know how quickly I’ll be able to reach any conclusion. I was trying to make the problem happen the other day - to see if not having the MIDI controller connected made any difference - and, naturally, I couldn’t get the problem to happen at all.
I may have tracked down the session close hang to a mutex operation in the generic midi control surface destructor code. The backtrace is below. This is with alsa on ardour 6.9. I’ll try to find the bug and then raise it on the bug tracker since that may be the more appropriate place.
DJG
#0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6 #1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0xec8bab8) at …/glib/gthread-posix.c:1340 #2 0x00000000007361b2 in Glib::Threads::Mutex::Lock(Glib::Threads::Mutex&) (this=0x7fffffffa2f0, mutex=…)
at /usr/include/glibmm-2.4/glibmm/threads.h:687 #3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0xec8bab0, c=…) at libs/pbd/pbd/signals_generated.h:1118 #4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0xa0d51c0) at …/libs/pbd/pbd/signals.h:94 #5 0x0000000000736866 in PBD::ScopedConnection::disconnect() (this=0x9fe2af0) at …/libs/pbd/pbd/signals.h:153 #6 0x00007fffd3fb3c72 in MIDIControllable::midi_forget() (this=0x9fe29c0)
at …/libs/surfaces/generic_midi/midicontrollable.cc:109 #7 0x00007fffd3fb3cb2 in MIDIControllable::drop_external_control() (this=0x9fe29c0)
at …/libs/surfaces/generic_midi/midicontrollable.cc:117 #8 0x00007fffd3fb3b29 in MIDIControllable::~MIDIControllable() (this=0x9fe29c0, __in_chrg=)
at …/libs/surfaces/generic_midi/midicontrollable.cc:92 #9 0x00007fffd3fb3c0e in MIDIControllable::~MIDIControllable() (this=0x9fe29c0, __in_chrg=)
at …/libs/surfaces/generic_midi/midicontrollable.cc:93 #10 0x00007fffd3f6a966 in GenericMidiControlProtocol::drop_all() (this=0x55b4c70)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:326 #11 0x00007fffd3f69e21 in GenericMidiControlProtocol::~GenericMidiControlProtocol()
(this=0x55b4c70, __in_chrg=)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:231 #12 0x00007fffd3f6a032 in GenericMidiControlProtocol::~GenericMidiControlProtocol()
(this=0x55b4c70, __in_chrg=)
at …/libs/surfaces/generic_midi/generic_midi_control_protocol.cc:233 #13 0x00007ffff7090cbf in ARDOUR::ControlProtocolManager::drop_protocols() (this=0x1c3e210)
at …/libs/ardour/control_protocol_manager.cc:204 #14 0x00007ffff7698f20 in ARDOUR::Session::destroy() (this=0x5744240) at …/libs/ardour/session.cc:612 #15 0x00007ffff7697e3c in ARDOUR::Session::~Session() (this=0x5744240, __in_chrg=)
at …/libs/ardour/session.cc:505
Line numbers may have changed owing to my own local modifications.
I’m thinking the error might be a double acquire of the same _mutex that is not recursive in two successive calls. But this would not deadlock randomly, but consistently. I can’t work out whether the mutex is shared or not since I don’t yet understand what the base class inheritance that defines _mutex is in the following definition at line 80 of signals.h (the boost class is new to me)
class LIBPBD_API Connection : public boost::enable_shared_from_this
#3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0xec8bab0, c=…) at l
ibs/pbd/pbd/signals_generated.h:1118 // Second lock on the same mutex?
#4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0xa0d51c0) at …/libs/pbd/pbd/signals.h:94 locks on _lm first
Also, useful backtraces in multithreaded apps nearly always require: thread apply all bt rather than just a trace from one thread, so if you could add the output of that, it would be helpful.
Thread 1 (Thread 0x7ffff038ae80 (LWP 13568)): #0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6 #1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0x9479ae0) at …/glib/gthread-posix.c:1340 #2 0x00000000007361b2 in Glib::Threads::Mutex::Lock(Glib::Threads::Mutex&) (this=0x7fffffff9940, mutex=…) at /usr/include/glibmm-2.4/glibmm/threads.h:687 #3 0x00007ffff5f59d98 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::disconnect(boost::shared_ptrPBD::Connection) (this=0x9479ad8, c=…) at libs/pbd/pbd/signals_generated.h:1118 #4 0x00000000007367b3 in PBD::Connection::disconnect() (this=0x48c0150) at …/libs/pbd/pbd/signals.h:96 #5 0x0000000000736866 in PBD::ScopedConnection::disconnect() (this=0x48bfff0) at …/libs/pbd/pbd/signals.h:155 #6 0x00007fffd3fb3e32 in MIDIControllable::midi_forget() (this=0x48bfec0) at …/libs/surfaces/generic_midi/midicontrollable.cc:109
Thread 72 (Thread 0x7fffa2cdd700 (LWP 13651)): #0 0x00007ffff33a61bd in syscall () at /lib64/libc.so.6 #1 0x00007ffff52d21dc in g_mutex_lock_slowpath (mutex=0x48c0160) at …/glib/gthread-posix.c:1340 #2 0x00000000007361b2 in Glib::Threads::Mutex::Lock(Glib::Threads::Mutex&) (this=0x7fffa2cdc8c0, mutex=…) at /usr/include/glibmm-2.4/glibmm/threads.h:687 #3 0x0000000000769521 in PBD::Connection::signal_going_away() (this=0x48c0150) at …/libs/pbd/pbd/signals.h:110 #4 0x00007ffff5f4aef0 in PBD::Signal2<void, MIDI::Parser&, MIDI::EventTwoBytes*, PBD::OptionalLastValue >::~Signal2() (this=0x9479ad8, __in_chrg=) at libs/pbd/pbd/signals_generated.h:966
I can see stopping the auto-connect mechanism in advance may help avoid the deadlock being entered via the route I was hitting.
The signal.h change moves the mutex acquire into the conditon, so surely the deadlock can still happen if that condition holds? Perhaps you have an argument as to why it won’t. Also, you’ve replaced a mutex acquire with a loop that performs try-acquire until it succeeds, but with a 100 us between attempts. This has no semantic difference if I read it correctly, but I’m not familiar with these particular C++ primitives. Your change could benefit a co-routine-based concurrency implementation where the mutex acquire does not enter the threads package and simply spins, but most futex implementations will already implement what you’ve coded by spinning a few times before yielding to the scheduller, which is exactly what was indicated in the backtraces I had, where the thread was blocked in the syscall.
Anyway, I’ve applied both patches to my build and so far its not hung over multiple session closings this morning…