This is just a pipe dream post, wondering about possibilities for AI features in Ardour and difficulties in implementation.
AI stem separation is becoming more common part of producer workflows, via third party APIs like lalal.ai/LANDR (there are dozens of such web services) or even natively within DAWs like FL Studio and Logic.
Is there currently any discussion for implementing such features in Ardour?
I guess most obvious difficulty is pricing and hardware limitations:
Pricing: Using third-party APIs costs would scale per request, so would to limit calls or having pricing tiers/subscriptions as Matt Tytel did for Vital’s text-to-wavetable feature.
Hardware: Using open-source source separation models (like Spleeter or Demucs), there would be hardware (RAM/GPU) and cross-OS compatibility issues. (I guess platform-specific DAWs like Logic don’t worry about this.)
bundling python with Ardour cross platform is where this project halted
UVR and Mixxx devs implement python AI libs by converting it to ONNX format… Would that route be plausible for Ardour development too? Or does it involve too much external dependency mess?
Not exactly. ONNX is an interchange format for models, you still need a runtime to execute inference using the model weights. What the Mixxx project did was to extract the model parameters from demucs, which were apparently intertwined with the code, into a separate model file which could be imported into a different runtime.
You still need to find an appropriate runtime inference engine to incorporate into the project.
I built OpenVINO for Linux/Audacity last summer and it worked okay. It did pretty well when extracting two stems (music and vocals), but not so well at extracting four stems which contained tons of artifacts. I would expect training my own models may have improved on this, but training models is extremely difficult.
Speaking of audacity, no time to research this right now, but leaving this as a mental note here.
Iirc, there was a stem separation V.A.M.P. Plugin. Gotta rush now, maybe someone else can shed light on the meantime
There’s Demucs GUI (FOSS) available, and it does a very good job of separating tracks into 4 basic stems - drums, bas, vocals, other.
Where things get realy interesting is at that “RipXDaw” (nonfree) level, where you can pretty clearly separate individual instruments.
Demucs developer had the idea of doing more in-depth version of his software (there’s a clip on youtube of him speaking about it), but he didn’t made/release it yet.
That’s the man (i think) that sould been contacted about it.
Having new, advanced version od Demucs inside Ardour would be awesome. (Then i could probalby remix my first electronic music album that i did when i was still a highschool teenager ) .
I don’t think this would be any better than Spleeter, which Robin has explored.
Just like Spleeter, Demucs is Python based and relies on a lot of Python libraries, which is extremely problematic to include in a binary distribution like Ardour.
But, unlike Spleeter, Demucs, which Demucs-GUI is dependent on, is abandonware.
Native stem separation in Ardour would be a cool feature to have, in the meantime, I have a project that uses lua scripting and demucs for four stem separation if you are interested:
I have a purchased RipXDAW on my Windows partition, my intention was to use it mostly to help remix Audio from old concert footage and the results are heavily mixed just like people report from the other solutions. Sure on modern already well-mixed material these tools can be impressive but my experience with live footage with Audience applause or bands with both guitars and keyboards, or a string and horn section things often end up worse than they started with all kinds of new phase-y artifacts. I would hazard to guess nobody is training the models on such old and difficult material so even through several updates of the software the results have not really improved all that much. Suffice it to say I’m not getting Peter Jackson results…
That has been my experience with it so far, but I agree it’s going to pretty much be a required feature in the DAW future…
I’m not too familiar with AI stem separation outside of using demucs [specifically, v4 ft (fine tuned)], and many other models on Ultimate Vocal Remover v5.6. Personally, I’d benefit as a user if it were implemented somehow in Ardour, but I assume finding the right smith set of process methods and their relevant models would be tricky. And then UVR has advanced menus for each processing method and more. I wonder how an implementation in Ardour would appear…
While I also assume that almost always it’s used for remixing/sampling, so far I’ve personally used it for the purposes of
Trying to get a clearer copy of the vocals to recognize the words of a song(if I can’t find the lyrics),
Learning some drumming patterns
Trying to hear out and recognize chord patterns (or note patterns/intervals, if the sequence is faster than what I can understand without the efforts of separating the stem(s))
[For the latter point, I’ve also used NeuralNote sometimes on the separated stem, to try and help myself recognize some patterns, though I’m not fully accustomed to a MIDI pianoroll.]
Yea, “phasey artifacts” i kinda knew it…that’s exactly what i was afraid of. I was on the edge to purchase RipX just to save my first work from the oblivion (funny i made an entire album back then, and i wasn’t even aware that’s called “music production” ) , but when i tought about it, my dilemmas just multiplied for various reasons. Phasey afrifacts is something i definitely don’t want to introduce.
Lately i’m thinking of doing a proper “Redux” from scratch, but i freeze when i think about how much actual work is in it (midi programming, different instrument layering, envelope shapping etc) , cause, you know, it’s not a “Master Of Puppets” or something, it’s just a self-made album by a boy influnced by Moby, Enigma, Prodigy and d n’ b.
And i was stupid enough to keep the projects on floppys. Some i can’t even find, some contain something entirely else now, and some are just unreadable. All that is left safe is, by some miracle, still functional CD. How dumb can a young man be?
Interesting. I’ve used some of the on-line services, but this the first I’ve heard of Spleeter and Demucs. I took a look at Spleeter - definitely worth a try.
@x42 mentioned the complexity of python, but if you’re running the models locally, you’ve also got GPU libraries that might need to be installed. On the other hand, if you’re using models in the cloud, you might not need anything more than some kind of REST client.
I agree with @Locynaeh that a standalone app seems relevant, especially given the complexity of running on a local GPU.
Surely we don’t want Ardour to be dependent on something like TensorFlow, do will? I think that’s what it would be with Spleeter, basically.
Of course, TensorFlow integration would probably give Ardour the capability of running some of its effects processing on a GPU, which might be useful. Still…
There’s a lot of AI models available. Which stem splitter to use should probably be up to the user; like picking samples or plugins.
What would the GUI look like? You select a region and click on an option to run it through a stem splitter?