They’re not spoiled. They have a whole 'nother set of problems.
There are countless ways to do the same thing, and economy of scale is HUGE for chip manufacturers. So it seems that every chip can do everything, and the developers have to wade through all of that.
Every standard has its tradeoffs:
- One has near-zero latency but requires a TON of separate wires that all have to be exactly the same length, hence the squiggles that you might have seen on a complex circuit board. (yes, the speed of light matters, even across that short distance) Another eliminates that problem by only using a couple of wires, but it also creates an entire sample time of latency, just for that one communication link.
- One allows a bunch of channels to be carried across the same set of wires, which pushes the signals farther up into the radio frequency spectrum with corresponding EMI issues. Another only allows two channels, which allows a cheaper design for that link.
- One allows some additional information to be sent across the same link. Another only allows the samples themselves.
- Etc.
It’s not as simple as saying, “I want this named standard,” even though the chip manufacturers try to make it that way with their software libraries. The hardware configuration itself knows nothing about any standard, and is simply told the raw details instead, in a format that is itself designed to be easy for a set of logic gates to interpret. Different combinations of details result in different standards being followed, and it’s just as easy to mis-configure and violate every standard. This could be developer-error, OR a bug in the chip manufacturer’s library.
And there may even be cases where you’d want to modify a standard for some reason. Technically in violation of the official spec, but it works better for what you’re specifically doing right here and now. Specifying details to the hardware instead of a name, allows that.
Even when the hardware is “right”, it often doesn’t cover the entire standard. There’s still software involved to finish that standard. Again, manufacturer’s library vs. user code, with the possibility for bugs either way, and for specific-application tweaks.
All of that is only for communication between chips: [ADC] → [DSP] → [DAC]. Inside the DSP box, there could be a single chip that does everything, or multiple chips that have their own communication between them, which creates another round of the above with possibly different decisions. And of course, there’s still the tradeoff between using a licensed DSP library that may or may not be cost prohibitive, and rolling your own that may or may not have some performance issues with no outside license.
And it’s not quite as simple as just [ADC] → [DSP] → [DAC] either. Some ADC’s and DAC’s are just the converter and that’s it, while others include their own small DSP. Do you want to use that? Or do you want to keep everything inside of the main DSP chip(s)? Different projects with different goals have different answers to that question.
And of course, there’s the choice of DSP chip itself. How much do you want done for you, with possibly non-ideal decisions locked in? And how much do you want to do yourself, with the amount of direct understanding required to do that?
So yes, once you work through all of THAT, it’s easy to have just 3 samples of DSP latency plus the 10 or so that each converter adds by itself - 1 in, 1 for processing, 1 out - but you do have to work through all of that!
In case you’re wondering, the 10 or so samples of latency that each converter adds, comes from the high-order digital lowpass that is a necessary part of how the conversion works in the first place.
There is no direct conversion from analog to 24-bit digital at 48kHz, or vice-versa. The technology to do that is nowhere close to existing, and it would require a stupidly expensive analog lowpass anyway, to keep it from “aliasing”, or “accordion folding” higher frequencies up to infinity, back down into the audible range, after which they cannot be removed.
Instead, it physically samples in the mid-MHz range, which allows a much cheaper analog filter, with fewer bits because that is possible, plus some intentional high-frequency wiggle added to the analog signal to guarantee that the reading is always changing. That insanely-fast low-resolution signal is then digitally lowpassed to do three things simultaneously:
- Anti-alias to the final sample rate
- Remove the high-frequency wiggle
- Fill in the lower bits, as a sort of average of that wiggle, biased by how close the real value was to a transition between physical readings
A side-effect of that digital lowpass is to add the conversion latency (no free lunch), and that is where most of a commercial console’s latency spec comes from. It’s not from the DSP.
Then once a mid-MHz stream of full-resolution samples exists, the ADC simply picks some out at the final rate to send on, and throws away the rest. Of course there are optimizations, like only running the lowpass when its output will actually be used, with a block of inputs that was buffered up in the meantime, but you get the idea.
All of this is done in the converter chip, with dedicated single-function hardware, not in the DSP. It also sheds a different light on the concept of sample rates in general, and why it doesn’t make as much sense as people claim, to use a higher sample rate like 96kHz or 192kHz.
When you do that, you’re actually changing modes in the converter chip itself. The mid-MHz physical sample rate does not change. Likewise, the analog lowpass does not change. What changes is the digital lowpass. But it’s not the corner frequency that changes. It’s the rolloff rate. It’s still attenuating just above audible! Just not by as much, and that less-aggressive rolloff produces fewer samples of latency in addition to the time per sample being less.
That - latency - is the reason to use a higher sample rate, not ultrasonics. For a recording that is intended for human ears with no speed changes, a high sample rate makes no difference whatsoever.
If you’re doing some scientific analysis, like bat calls or whatever, you ideally need a specialized converter that doesn’t insist on attenuating all non-human frequencies no matter what you tell it. Or at the very least, understand how your human-audio converter actually responds up there, and EQ it back to somewhat flat.