Open network audio protocol

Companies like Midas, Aviom, Mackie, and others are using proprietary protocols for networking. Imagine a world where all audio devices, from all manufacturers, can work together in the digital realm.

The Linux audio community needs to recognize this opportunity and put a lot of funding into netjack and related programs. Then, it needs to submit this protocol to a recognized standards organization for adoption. Once this is achieved, the community can expect manufacturer adoption, and therefore, funding.

Nice idea. Why don’t you try this? When you do, you might want to start by considering why none of the existing protocols have been adopted as a standard.

i’d love to see this too. i have been told a few times that it is very difficult, but i don’t have a good enough grasp of the lower level concepts to understand why. i had an idea a little while ago to make little ‘satellite’ units that used netjack to synchronise to a master box over cat5 to make a simple distributed audio network, but i don’t know whether or not that would work. i might try by setting up some laptops or something with it to test.

porl

I have been hoping for the same thing. I wondered if it was a chicken and egg problem. I wonder if one of the vendors are going to see the advantages and do it themselves.

Paul, can you clarify what you mean when you write “you might want to start by considering why none of the existing protocols have been adopted as a standard.”?

I really like the idea of having a low latency digital snake that works over gigabit ethernet that carries 64+ channels at 96kb. This could come straight into the computer without ADAT or whatever!

Very difficult? I am going out on a limb and saying that compared to other Open Standards (Like ODF) creating and implementing this one appears Trivial.

As follows are some very crude ideas about creating a AEP (audio exchange protocol). I’ll call these ideas the basis for the LNORBO AEP v .00002. (lol, I still need my 15ns of fame :wink: )

Background:

  • The initial network would be based on Gigabit Ethernet.
  • It would use the UDP protocol as there is no overhead. (see wikipedia for UDP)
  • It would be somewhat parameter driven, allowing the number of channels, sample frequency etc. to vary by need and as hardware improves.
  • There would be “compatibility versions” to allow newer devices to “fall back” and interact with older ones.
  • Devices would be allowed to request configuration information from others
  • Error handling can be specified

Lets say we have a gigabit ethernet connection. It would be dedicated to this task.
Assume we can only get 500Mb/s out of this connection.
Assume a sample size of 32bits. (for ease and future proofing)
Assume a rate of 96,000 samples/second.

500,000,000 / 32 / 96,000 = 162.76 Channels

If the person did not need 162 channels, they might want to opt for a Error handling method where all data is broadcast twice to get 81 channels. (we are open to different types of Error Correction Codes (ECC) in this spec )

Theoretical case

  • A network where Latency < 1ms. (sound travels approx 1ft in 1ms in air)

  • < 81 channels is needed.

  • dedicated network (as always)

  • 96000 samples/sec

  • In this case we CAN transmit everything twice because we are loading the network lightly.

  • Because it is more efficient to transmit larger packets, we will combine the samples from each round of A/D conversions from all 81 channels. (I’ll call it “chunking”?)

81 channels * 32bits = 2592 bits.
To make this even more efficient(?), I will combine two sets of samples and transmit at the same time - 5184 bits.
For redundancy, I will retransmit the data .5 ms later. This will allow the receiving end to replace lost packets within our 1ms latency goal.

There would have to be a messaging protocol set up to allow a client to change the head amp settings of a remote device. (turning on Phantom Power, setting gain etc.)

  • All this would have to be tried in real life, I’m sure our specs would change as we found opportunities/road blocks.
  • For me making the remote head amps would be the tricky part. Not my area.

I’m sure there are things I wrote that are not clear, Just ask, I may even be able to explain myself. lol

Previous post continued (HTML posting error)

Theoretical case

  • A network where Latency is less than 1ms. (sound travels approx 1ft in 1ms in air)

  • fewer than 81 channels is needed.

  • dedicated network (as always)

  • 96000 samples/sec

  • In this case we CAN transmit everything twice because we are loading the network lightly.

  • Because it is more efficient to transmit larger packets, we will combine the samples from each round of A/D conversions from all 81 channels. (I’ll call it “chunking”?)

81 channels * 32bits = 2592 bits.
To make this even more efficient(?), I will combine two sets of samples and transmit at the same time - 5184 bits.
For redundancy, I will retransmit the data .5 ms later. This will allow the receiving end to replace lost packets within our 1ms latency goal.

There would have to be a messaging protocol set up to allow a client to change the head amp settings of a remote device. (turning on Phantom Power, setting gain etc.)

  • All this would have to be tried in real life, I’m sure our specs would change as we found opportunities/road blocks.
  • For me making the remote head amps would be the tricky part. Not my area.

I’m sure there are things I wrote that are not clear, Just ask, I may even be able to explain myself. lol

that looks all well and good, but i believe the difficult problem is sample accurate synchronisation between devices (to avoid clock skew etc)

I haven’t tried any implementations with this yet, but isn’t RTP something that could eliminate these sample accurate synchronisation problems ?

Might be interesting for you all:

http://www.ethersound.com/index.php

In my opinion this is the standard in broadcasting audio via Network-Protocol.

I am using it for several years now (as a livesound-engineer in touring, musical, RnR), and cannot remember ANY trouble. Even though the very flexible patch, control & monitor-software solutions (Auvitran, Digigram) only runs on windows (!), it works very relyable & safe.
But also “Aviom” and “rock-net” Products work very well, but are IMHO not so modular and/or not controllable (-> and not recallable) from software…

Can I point out, that, whilst it is very far from a standard of any kind, we already have “netjack”, which I’ve been using for a few years to transport 8 channels of audio each way with very low latency and with a high degree of reliability.

The complexity comes not so much from the network transport side as from the need to deal with the fact that the AD and DA clocks at each node are not synchronised (And thus they all have slightly different ideas about what 48Khz means).

Netjack sidesteps the issue rather neatly by only having an audio interface at the master, with all the slaves essentially doing batch processing based on data from the master node.

Ethersound/Cobranet/LiveWire et al solve the problem by using hardware to lock a pll at the receiver to the incoming packet rate, doing this in software with a resampler is possible (And there are jack clients that do it), but doing it in software with low latency is a hairy control problem.

The clock sync problem is also I suspect patent infested waters.

Regards, Dan.

netjack2 has come a long way since i last checked. there is still the issue obviously of clock synchronisation, but does anyone know in the real world how much of an effect this has?

for example, if i was to use it in a live set up, with a small netjack ‘node’ on ‘stage 1’ (say using a 4 in 2 out audio device for stage mixes), a second node on ‘stage 2’ (same set up) and a master node at the desk (4 out, for f.o.h. and headphone mix) and had the two stages going simultaneously all feeding through to the f.o.h. speakers, would the resampling between devices be noticeable to listeners? if i was to record the event for future mixing etc would the quality be good enough or would there be noticeable glitches (clicks etc) making it unsuitable for professional work?

obviously this is ignoring hardware quality, i would assume some hardware is better ‘calibrated’ than others.

also is there any way using netjack (in theory, not necessarily implemented) that you could send a clockrate signal reliably through the network and lock the audio device to it properly (to avoid the resampling altogether)?

porl

I am not sure yet how this could work and I find it hard to believe that it’s already possible for all audio devices to work together in the digital realm. I think internet protocols will have to suffer major changes for this to be possible and this of course takes time. I have somewhat experience with diameter stack that’s a complex feature, such modifications are a real challenge for internet experts.