... speed up exports

When I am exporting projects or doing stem exports, my cpu load remains fairly low. When I am using the ondemand governor on Linux, the CPU cores even throttle down. Moreover also the hard disk i/o performance is never above a couple of MB/s. So the hardware seems to have quite a bit of potential to speed the whole process up. I have 12 GB of RAM. When I updated from 4 GB two years ago I felt no big difference in exporting time.

Is there anything I can do, to make exporting faster? Recently I recorded a three hour podcast episode and it took almost an hour to export it. And that was only exporting the regions using stem export without any effect plugins.

It’s at “all but one cpu”. So that would mean, that at three of my four cores should be at 100%.

It’s got to be either the HD or the CPU…
Does running “top” in a terminal window during the export tell you anything useful?

For most exports your CPU should be running flat out. Ardour lets you select how many cores of a multi-core CPU are used for DSP; have you checked that setting?

I will post the output of “top” and “iotop” as soon as I try again.

Some more information: I experience the same behaviour on two different machines.

  1. i5-CPU, 12G RAM, two spinning hard disks sticked together into one using LVM. Filesystem ext4 on ubuntu. No hard disc crypto.

  2. i5-CPU, 4G-RAM laptop, one SSD hard disk. Filesystem ext4 on ubuntu, No hard disc crypto.

More or less the same behaviour.

Have you tried using the performance governer, generally its recomended not to use ondemand when using realtime stuff as it can cause issues but ive never heard of it not allowing use of full potential.

But its worth a try.

It may be a limitation to ardours code, i have heard that modern processors never really get maxed out. if you run a benchmark i think its about 70-80% cpu load max achieved across all cores.

whats the dsp load during export?

Modern processors absolutely get maxxed out. Just try building Ardour from source :slight_smile:

One new observation …

When I start jackd with a dummy driver the export is much faster. Still the processors throttle down and by far don’t get maxed out, but it’s much faster than before.

Usually I am using a Focusrite Saffire 24 pro FireWire audio interface with the ffado drivers. Then the export is something like ten times slower.

In both cases the DSP-load indicated by jack is 100%.

I am using jack 1.9.10 BTW.

I will try an USB Audio interface tomorrow.

I’m surprised that jack has anything to do with exporting, since it’s not a real-time operation. Maybe there’s a way of running it asynchronously. Still surprised that it should have anything to do with which sound hardware it’s attached to. You might be on to something there…

@anahata: I’m not sure how it all works now there are new back-end options for Ardour, but as I understood it, previously JACK was involved even when exporting, however in that case JACK ‘freewheels’ - which in theory would mean that audio just gets processed at the fastest speed possible - but in the past could lead to the utterly incomprehensible notion of ‘xruns’ when exporting / processing ‘offline’ in paritcular if a poorly designed application / plug-in (‘jamin’ springs to mind) was processing audio in other threads without proper synchronisation or avoidance of locks in the audio callback. (it seems like export reliability has always been er… ‘interesting’ with ardour, for about as long as I can remember)

  • I would also assume that JACK has to be involved when exporting, if for example you have anything in your session (JACK inserts for example) which connect to other JACK applications

We don’t allow hardware inserts to be active for faster-than-realtime exporting. We do allow inserts that connect normal JACK clients.

The unreliability of export with Ardour has ALWAYS involved Jack2, not Jack1. Most or all of the issues in Jack2’s handling of freewheeling have been resolved in the last year or so. Jack1 never had these problems.

FWIW, I’m seeing the same thing as johmue on at least two computers. Both are running Fedora/CCRMA, which comes with JACK2 (F20:, F21:1.9.10), and both have i7 CPUs – an M620 from about 2010, and a recent 4790 – with 2 and 4 cores (with HT), and 8 and 16 MiB of RAM, respectively. The F20 machine is running a stock kernel; the F21 machine is running an “RT” kernel.

I did an experiment in which I opened the same project on both computers using the same version of Ardour (3.5.403) and exported the same 1020-second range from a stereo bus (not “master” in this case) to a 16-bit FLAC file, with a mix from 3 recorded tracks. There was one plugin in use - LinuxDSP’s MBC2B, pre-fader on the bus being exported. I did 3 exports on each machine: with the plugin enabled; with it bypassed via its own master bypass button; and with it bypassed using Ardour’s plugin bypass (the little green LED). Here’s a quick summary, showing elapsed time (manual stopwatch) for the export and a calculated speed-up relative to real time:

    plugin enabled
        computer1:  141 s ( 7.2x)
        computer2:  101 s (10.1x)
    plugin bypassed (by plugin)
        computer1:  126 s ( 8.1x)
        computer2:   93 s (11.0x)
    plugin bypassed (by Ardour)
        computer1:   98 s (10.4x)
        computer2:   69 s (14.8x)

In all these cases, computer2 looks to be about 1.4 times the speed of computer1 – I expected that ratio to be larger than that. (It’s probably coincidence, but that isn’t far from the ratio of their clock speeds (4000/2667), which I wouldn’t expect to be relevant.)

During each of the exports, the aggregate CPU load (according to top) hovered around 30-35% on computer1, and 20-25% on computer2, with occasional +/- 5-point excursions in each case. (I didn’t try very hard to eliminate other loads.) I also had gkrellm running, and from what I could see on its graphs, the load appeared to be fairly evenly distributed across all cores/threads. It never came anywhere near maxing out – on any core, let alone in total.

On computer1 (JACK, M-Audio FastTrack Pro - USB), qjackctl and Ardour showed 100% DSP load (in red) while exporting. On computer2 (JACK 1.9.10, M-Audio Delta 66 - PCI), they didn’t change much from their idle values – ie, about 7 - 9%.

I can try running with a locally-built JACK1 when I get some time here, to see if that makes a difference.

I seem to be having the same issue.

While exporting an hour-long session with about 12 tracks, 4 calf 8-band EQs, 3 calf mono compressors, 2 Calf multi-band compressors, 1 Reverb & 1 calf multi-band limiter, the DSP shows 100%, but my CPU usage never goes beyond 35-40%. It takes about 20 minutes to export into a single stereo wave.

It is not the disk, because, one, it is an SSD and, two, when I turn off the calfs, the same session exports in 7-8 minutes.

My system is a 6-core AMD FX6100 with 8GB RAM, running Ubuntu 16.04. (also checked on Ubuntu 14.04).

The things I have tried are:
-using ‘all but 1 processor’ & ‘using all processors’
-running as root
-running on a different distro (mint)
-using jack+dummy driver
-using Ardour 4 & 5

Well I haven’t yet compiled from source, as I don’t feel too confident.

I suspect this has something to do with the way processes are shared between multiple cores. It’s all very well having a 6 core CPU, but there may be a lot of single-threaded work going on (perhaps in those Calf compressors), which will keep 1 or 2 cores busy but may be unable to share the load equally between them all.

I’m not an expert on multicore software design, but I’m pretty sure applications have to be designed and coded specifically to get the best performance out of multiple CPU cores. I would guess the Calf software has not been coded in this way.

I checked. All my cores are used almost equally during an export. Their graph is averaging constant between 30%-40%.

I think a single threaded process can hop randomly from one core to another when there are task switches, so the load would still be distributed across the cores over a period of time. On that basis, 30-40% each with 6 cores suggests to me most of the work is being done by two threads, though for all I know it may not be as simple as that.