I do the same thing…almost…and it’s a bit more convoluted. OBS is involved, but so is a separate online meeting. Everything feeds Ardour directly, and everything is fed from Ardour directly - nothing else has any audio processing whatsoever, except for what I can’t disable or set to do nothing, and nothing has any direct connection to anything else. It’s all in Ardour.
I could stream from OBS to YouTube, just by clicking that button in OBS, but I’ve never seen the need to on that rig. I do record in OBS and upload afterwards though.
Anyway, the key here is latency, on a system that is doing lots of other things too.
For a dedicated system that does nothing else - like a digital console - single-sample latency is possible through the DSP: get one sample from every input, process that one sample all the way through, deliver one sample to every output, and repeat. Then you’re left with only the converters’ group delay, which can be in the range of 1ms or so, analog to analog, at 48kHz, and that’s because of the FIR filters in the converters themselves. (Tech note 1)
Higher sample rates can use shorter FIR filters with fewer samples of delay (Tech note 2), in addition to less time per sample, if you really care about that, and this reduced latency is the only real benefit that I see to higher sample rates, live.
Tech note 1:
(cheap, gentle-slope, analog lowpass for anti-aliasing, outside the ADC chip - then the chip actually samples with low resolution in the mid-MHz range, plus a small amount of intentional ultrasonic noise to force the LSb of that to wiggle - then steep-slope digital FIR lowpass at that mid-MHz sample rate, with cutoff for Nyquist at the desired output rate, which also converts the out-of-band-noise into more resolution - then just pick samples to send out at that desired rate and throw the rest away - all inside the ADC chip itself)
(similar for the DAC, but in reverse)
Tech note 2:
(This is most of what happens when you tell the converter to use a higher sample rate. Its actual analog rate doesn’t change at all, nor does the analog filter that precedes it. You might think of it like an engine and transmission: the engine drives the analog sampler directly, and it has enough variability to cover 44.1kHz to 48kHz output with a little bit extra on either side, but much different rates like 96kHz need a different “gear” for the same engine speed / given clock, and thus the same actual analog rate and FIR rate. That different “gear” uses a different FIR with the same cutoff but relaxed slope, and it’s that relaxed slope that produces less delay.)
For a non-dedicated system, that has to manage a filesystem, fancy GUI, live video processing, and whatever else you’re doing, all at the same time on the same hardware, you need a buffer that can fill up on the input side while it’s doing other things, and play out on the output side while it’s doing other things. Then it processes the entire buffer at once whenever it gets around to it. Record X samples for each input, process all of those samples at once, deliver them all at once to each output to play out, and repeat only when the input buffer is full again. That’s in addition to the converters’ delay, which is unchanged from above.
And it’s usually double-buffered, so that it has an entire buffer’s worth of time to get around to grabbing the input, or to delivering the output, and doesn’t have to get there exactly between the right pair of samples, which practically never happens. That double buffer also doubles the latency at that step, again in exchange for smoothness on a system that’s doing everything else too, on shared hardware.
The speed of sound in air is about 1125 feet per second, or (very roughly) 1 ft/ms. So a 48kHz dedicated system with 1ms total worth of converter delay (ADC+DAC) - plus about 3 more samples (negligible) for communicating in, processing, and communicating out - typically sounds like the speakers are about 1 foot behind where they actually are, in terms of timing. Very few people are going to notice that.
Now add the buffer that you need for your non-dedicated, PC-based system and convert that to distance. If you (or your listeners) can stand the speakers being that much farther away, you’re good! If not, you need to do something different.
Of course, if you’re running all of this through even a half-decent video production thing like OBS, then that’ll have a way to adjust its buffers to make things line up again. Video processing generally has more latency than audio processing, mostly because of the MUCH slower “sample rate” (called “frame rate” over there), so you probably need to delay the audio anyway to line back up with it. So if Ardour takes some of that delay but not all of it, then there’s no change at all in the viewers’ experience! And all you have to do is reduce OBS’s Sync Delay for the sources that come from Ardour, by the amount that Ardour takes.