AI stem/source separation in Ardour

I built OpenVINO for Linux/Audacity last summer and it worked okay. It did pretty well when extracting two stems (music and vocals), but not so well at extracting four stems which contained tons of artifacts. I would expect training my own models may have improved on this, but training models is extremely difficult.

Speaking of audacity, no time to research this right now, but leaving this as a mental note here.
Iirc, there was a stem separation V.A.M.P. Plugin. Gotta rush now, maybe someone else can shed light on the meantime

There’s Demucs GUI (FOSS) available, and it does a very good job of separating tracks into 4 basic stems - drums, bas, vocals, other.
Where things get realy interesting is at that “RipXDaw” (nonfree) level, where you can pretty clearly separate individual instruments.
Demucs developer had the idea of doing more in-depth version of his software (there’s a clip on youtube of him speaking about it), but he didn’t made/release it yet.
That’s the man (i think) that sould been contacted about it.
Having new, advanced version od Demucs inside Ardour would be awesome. (Then i could probalby remix my first electronic music album that i did when i was still a highschool teenager :slight_smile: ) .

1 Like

I don’t think this would be any better than Spleeter, which Robin has explored.

Just like Spleeter, Demucs is Python based and relies on a lot of Python libraries, which is extremely problematic to include in a binary distribution like Ardour.

But, unlike Spleeter, Demucs, which Demucs-GUI is dependent on, is abandonware.

Cheers,

Keith

2 Likes

I see…So rewriting it all in C (if it’s even possible) would be a coding marathon, like another neverending good will black hole :slight_smile:

Native stem separation in Ardour would be a cool feature to have, in the meantime, I have a project that uses lua scripting and demucs for four stem separation if you are interested:

2 Likes

FWIW,

I have a purchased RipXDAW on my Windows partition, my intention was to use it mostly to help remix Audio from old concert footage and the results are heavily mixed just like people report from the other solutions. Sure on modern already well-mixed material these tools can be impressive but my experience with live footage with Audience applause or bands with both guitars and keyboards, or a string and horn section things often end up worse than they started with all kinds of new phase-y artifacts. I would hazard to guess nobody is training the models on such old and difficult material so even through several updates of the software the results have not really improved all that much. Suffice it to say I’m not getting Peter Jackson results…

That has been my experience with it so far, but I agree it’s going to pretty much be a required feature in the DAW future…

1 Like

I’m not too familiar with AI stem separation outside of using demucs [specifically, v4 ft (fine tuned)], and many other models on Ultimate Vocal Remover v5.6. Personally, I’d benefit as a user if it were implemented somehow in Ardour, but I assume finding the right smith set of process methods and their relevant models would be tricky. And then UVR has advanced menus for each processing method and more. I wonder how an implementation in Ardour would appear…

While I also assume that almost always it’s used for remixing/sampling, so far I’ve personally used it for the purposes of

  • Trying to get a clearer copy of the vocals to recognize the words of a song(if I can’t find the lyrics),
  • Learning some drumming patterns
  • Trying to hear out and recognize chord patterns (or note patterns/intervals, if the sequence is faster than what I can understand without the efforts of separating the stem(s))

[For the latter point, I’ve also used NeuralNote sometimes on the separated stem, to try and help myself recognize some patterns, though I’m not fully accustomed to a MIDI pianoroll.]

Yea, “phasey artifacts” i kinda knew it…that’s exactly what i was afraid of. I was on the edge to purchase RipX just to save my first work from the oblivion (funny i made an entire album back then, and i wasn’t even aware that’s called “music production” :slight_smile: ) , but when i tought about it, my dilemmas just multiplied for various reasons. Phasey afrifacts is something i definitely don’t want to introduce.
Lately i’m thinking of doing a proper “Redux” from scratch, but i freeze when i think about how much actual work is in it (midi programming, different instrument layering, envelope shapping etc) , cause, you know, it’s not a “Master Of Puppets” or something, it’s just a self-made album by a boy influnced by Moby, Enigma, Prodigy and d n’ b.
And i was stupid enough to keep the projects on floppys. Some i can’t even find, some contain something entirely else now, and some are just unreadable. All that is left safe is, by some miracle, still functional CD. How dumb can a young man be?

1 Like

Could something like this be adapted?

Cheers,

Keith

1 Like

I stumbled upon C++ ports of:

Maybe it can be of some use to @x42 ?

Though I wonder if a standalone app would not be more relevant.

Interesting. I’ve used some of the on-line services, but this the first I’ve heard of Spleeter and Demucs. I took a look at Spleeter - definitely worth a try.

@x42 mentioned the complexity of python, but if you’re running the models locally, you’ve also got GPU libraries that might need to be installed. On the other hand, if you’re using models in the cloud, you might not need anything more than some kind of REST client.

I agree with @Locynaeh that a standalone app seems relevant, especially given the complexity of running on a local GPU.

Surely we don’t want Ardour to be dependent on something like TensorFlow, do will? I think that’s what it would be with Spleeter, basically.

Of course, TensorFlow integration would probably give Ardour the capability of running some of its effects processing on a GPU, which might be useful. Still…

There’s a lot of AI models available. Which stem splitter to use should probably be up to the user; like picking samples or plugins.

What would the GUI look like? You select a region and click on an option to run it through a stem splitter?

There’s another Stem Separation Tool with a “Complete AppImage”

2 Likes

You don’t usually need GPU for that. spleeter runs fine and amazingly fast on the CPU.

1 Like

This project is Python Based, and has an alarming amount of signs pointing to it being vibe-coded. It mentions multiple appimages yet only 1 exists, only has one commit for all of the code, the code of conduct being, word for word “This project adheres to a code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to the project maintainer,” the readme doesn’t make any sense in some areas, and not only will the appimage not run on my machine, the build instructions and build scripts don’t work and sometimes point to files that don’t exist.

A.I. Stem seperation is a good use of the tech, but this just looks like vibe-coded nonsense.

3 Likes

If the project wasn’t “vibe coded”, would that make it not working for you feel any better? You haven’t reported any bugs on the github. It works fine for me, so this is an issue for you rather than the program per se.

AFAIK this is the only open source GUI on Linux available for using Stem Separation technology so far. If there is a problem other than it not personally working for you, do say but just deriding something as “vibe coded nonsense” is not helpful to anyone.

GitHub - Anjok07/ultimatevocalremovergui: GUI for a Vocal Remover that uses Deep Neural Networks. exists, has Linux support, and has instructions that work the distributions mentioned. The one mentioned above is not the only GUI available,

If it wasn’t vibe coded, I would’ve in fact reported the issue’s I’ve seen to the issue tracker, like referencing links and files that don’t exist. But I suppose I’ll take the time to make the issues on the project.

1 Like

I find this part of the StemWeaver README interesting:

Aside from the fact that the “Originality Statement” link is 404, the fact that it’s so adamant about not being derived from UVR (aka ultimatevocalremover) is interesting when you look at some of the change history like this:

I think most of us around here appreciate open source. Personally, I’ve found 3D prints of open designs I’ve published on Thingiverse for sale on e-bay. I find that obnoxious. Don’t think I’ll be buying the StemWeaver “developer” a coffee any time soon.

My 2¢

8 Likes

As far as I remember, AI stem separation was unavailable in 1970’ and greatest records were created without this gimmicky feature… :slight_smile:

2 Likes

Then again, DAWs were not available in the 70s either … :slight_smile:

6 Likes