Requirements for DAWs for a patent unencumbered Stem file format

Hi all,

I’ve been working on a format for storing stems alongside the final mix of a track (to make the file backwards compatible with existing media players) with the goal of having a non-patent encumbered alternative to the Native Instruments stems format.

My background is in mixing with stems while DJing, however, and not in music production and DAWs (though I do use Ardour for multi-track recording and for dubbing vinyl and cassettes frequently, it’s the best thing I’ve found for ripping a vinyl and doing click reduction!).

I wanted to join here to ask a few questions of the Ardour community who likely have a lot more experience with DAWs than I do. My questions are:

  1. Would such a format even be useful for use in a DAW outside of being an export format meant to be played back by DJ applications?
  2. If it is useful, what requirements for the format would Ardour have? For example, a DJ application likely only requires the ability to store 4 stems, but a DAW (I presume) would want to store many more stems in the final file.

If you have any ideas, I’d be very grateful for your feedback. If you’re technically minded and are curious, I have initial drafts for the format (using both Matroska and Ogg, though I’m gravitating towards using Matroska at the moment) here:

2 Likes

Interesting topic! When i perform with ardour i use a set of 10 stems all going through the same mastering chain. I play sets that extend up to 4h, so preparation of the stems is a long process. I generally try to mix so, that i always end up at the same loudness with using the same clipper, limiter and compressor settings. It would be amazing to have some sort of way to save mastering settings into metadata of the stems and then interpolate the mastering settings between the stems that are playing. This of course only would make sense if the mastering does not change to much between the different projects.
I think this would be cool but also a really niche usecase.

1 Like

Nice idea, and kudos to you for submitting an IETF Draft.

I am not familiar with Native Instruments stems format, but your proposal is very reminiscent of the Dobly Atmos file format:

The first N channels are the bed mix. That can be anything, mono, stereo, 5.1, 7.1.2. The remaining channels (up to 128 total) are objects: mono source with panning meta-data. The bed mix is commonly used for ambience only, and if vocals are an object, you get a Karaoke mix for free just by muting it :slight_smile:

But mixing it all requires Dolby’s secret sauce… Your drafts also specify some basic DSP as well (compressor, limiter, filters). However I expect that unless those are rigorously specified (and with test-data to perform a NULL check) it expect it will be pretty much useless.

You further specify

The stem tracks SHOULD NOT have any gain normalization applied.

Could you explain the reasoning behind this? When the audio data used fixed point, it is preferable to normalize before encoding.

Have you considered using WavPack as container?

Do you have a specific use-case in mind? Are you aiming at archiving sessions, or should the file be the end result for the consumer (like Atmos)?

2 Likes

I’m vaguely familiar with the DJ scene, having been given a tour and a play on a modern DJ deck last year.

By “stem format” I assume you are looking at something that is targeted at that sort of DJ deck similar to Serato or Rekordbox?

With these, a tool is used to analyse and separate the music files and to tag them with tempo and beat data to support easy beat-matching on the deck. They also do some tweaking to the stems to make them sound better in isolation, and some other stuff like phase correlation. I also believe they create caches, such as visual waveforms that are used on the deck to help the DJ align tracks.

I’m not sure if any of these are in your plans.

From the point of view of Ardour, some of these would map to stuff that’s in Ardour (and most other DAWs) but, clearly, Ardour doesn’t actually provide the analysis tools at this point.

Cheers,

Keith

That’s really interesting to me, I didn’t realize that Ardour was something anyone would use to perform anything, I’ll have to look into that and would love to know more about how you use it!

This is actually already supported (with a very naive setup that I suspect will have problems).

This is definitely a weak point in my knowledge. My understanding is that the Native Instruments format has metadata for a specific DSP (that you can license from them in a free-as-in-beer manner). This isn’t practical for an open format, so I tried having some generic metadata but, as you said, maybe that will just be too generic and sound different under any random DSP people pipe it through. I’m open to suggestions, maybe having mastering info just isn’t practical.

This is probably just poorly worded and needs to be re-phrased. The idea was that if you play all the stems at unity gain you should get basically the same thing as the final mix (minus mastering steps). So if you have some quiet synths and some loud drums, you don’t want to bring the individual stem tracks up to the same perceived loudness before exporting them as you might do if you were exporting multiple tracks for an album or something.

I am only vaguely familiar with WavPack but didn’t think it was as widely supported as other formats. Is there some benefit to it I should be looking for? I’m a bit hesitant to use one guy’s personal project, even he obviously knows what he’s doing and it’s a pretty neat technology!

I was specifically doing this for use in DJ applications such as Mixxx or Traktor (so for the end-consumer, but not for music listeners, for performers), and am curious if there’s a use for DAWs too.

Yes, and no. This is what I had intended it to be used for (like the Native Instruments format supported by Traktor controllers), but the software doing the splitting wasn’t really what I had in mind (though it could of course generate one of these as a cache file or similar). I had in mind a file that a producer could export. So eg. if you were producing music and were creating the extended DJ version of the track you might export in this format so that DJs are more likely to buy your track knowing that when they use stems for it they won’t be auto generated and won’t sound like crap like the auto generated stuff.

Thanks for the replies, all! This has been very helpful already!

2 Likes

The drafts do not contain a list of which codecs must be supported, and which codecs should be supported. That is needed to ensure compatibility between applications which use the stem files.

I keep going back and forth on this. I had thought about mandating FLAC and Opus, but I’m also not sure that the different vendors will want to support different file formats and it may not be strictly necessary. If one wants to support Vorbis and one wants to support Opus but doesn’t want to maintain support for Vorbis it’s still valuable for them to both have a known stems format. This would also possibly make the spec a bit less forward compatible if one day FLAC and Opus are old formats that no one uses anymore because new stuff has taken over.

I’ll keep thinking about it, maybe this can just be a recommendation and not a mandate, or maybe it does make sense to just say “support FLAC and Opus” and update the spec as necessary in the future.

Thanks again all! I have posted a new version of the Matroska version of the spec that incorporates some of your feedback (more still to come):

Just for fun: something I keep thinking about for the mastering side of things is including LV2 or VST3 or what not plugins directly in the Matroska file (with some basic limits on how many inputs and outputs they should have). This would let producers build in their exact effects, mastering, etc. directly in to the file and bypass all the messy metadata (but would also be a security and compatibility nightmare and is not something I’m actually considering doing at the moment, but it does provide some fun possibilities to think about!)

Note that FLAC only supports up to 128 channels, and only supports fixed-point data (no 32 bit float, hence my comment above about normalization).

Then again you can include an arbitrary amount of mono FLAC files in a container.

That gets into sticky licensing issues (how do you extend an offer to provide source from a Matroska file? Extra licensing text metadata/comments?), as well as compatibility issues (only one CPU ISA supported? Mac and Windows computers do not use the same CPU architecture, and Linux supports even more).

Indeed, and those are the least of the problems (an audio file that can execute code would be an unexpected security nightmare for most users)! Still, it’s been a fun thing to think through!

I’m curious what the requirements for DAWs would be here? Just having 32-bit samples and unlimited channels? I expect stems to be individual mono tracks for the most part, or maybe stereo but not too much more. Would DAWs need more than 128 channels for individual tracks for some reason?

It’s not immediately clear to me how this applies to normalization like you mentioned earlier either, I think we may be using normalization to mean two different things here?

FLAC does not support 32bit samples, and no floating point. Best you can do is 24 bit fixed point. Which, when normalized, is more than enough down to the thermal noise floor.

Right, I’m asking if you bring it up because that’s something DAWs frequently need? I’m trying to determine if this is a problem which means I should mandate some format other than FLAC for implementations of this spec.

Thinking about this more I’m thinking I’ll re-write portions of the I-D to have a target audience that includes both live-performance (DJ applications, mostly) and import directly into DAWs (for producers who want to release versions of tracks specifically for non-live remixing; eg. artists who release stem versions of their albums and the like such as Fleet Foxes “Shore (stems edition)”)

I may include a section that recommends (with “weak” RFC2119 language, at the end of the day this is a place where the software vendors are going to do what they want to do and there’s not much the spec can or shoul do about it) specific formats, but breaks them down based on the distinction above.

For playback, this would be FLAC and Opus, probably. I’m not sure what format would be most useful for folks importing a track and all its stems into a DAW, however? Does Ardour have a native format (other than just raw PCM) that it stores its own tracks in its project files with that I should consider for something like this?

Ardour uses a variety of mostly standard-specified audio file formats, with the choice being the user’s. The only two wrinkles are that (a) all audio we record goes to mono files (regardless of whether the track it was recorded for has 2 or more channels (b) our default is to store 32 bit floating point data in those files, which for a few persnickety bits of software is not within the standard for e.g. WAV.

1 Like

Makes sense, I’ll look around and try to find a standardized codec that supports 32bit IEEE float for importing into DAWs. Thanks!

It’s not a codec. It’s just regular PCM sample values. Almost everything can handle it these days.

Sure, what I meant is that for a file that is meant to be distributed and imported we’d probably want to recommend a compressed format such as FLAC (except something with 32bit float support).