…when all tracks/bussses have been processed, and data for the master is available, the master-bus runs by itself on a single core.
Is there a performance gain by summing outside Ardour? This would entail having NO master bus and multiple groups and tracks assigned to multiple outputs. Would every group act as a master of sorts and consume its own single core?
Nope. Same problem there: You still need to wait until all tracks/busses are processed before you can sum them.
The key takeaway is to not use DSP expensive plugins on the master bus.
–
To elaborate: say you have 3 plugins on the master bus:
EQ → Multiband compressor → Limiter
The compressor need data from the EQ, so it has to wait until the EQ completed processing and produces output. Likewise the Limiter can only run once the compressor compressed the signal.
Those plugins can only run in sequence and not in parallel. So a single CPU core is used.