so in theory, as long as every frame is enough to reproduce sound, any rio could join/leave a channel and be exactly in sync

If only it were that easy... Once the player app pushes the audio to the kernel it enters a buffer in the kernel (capable of storing about 0.8 seconds of audio). So what happens is that a Rio that is already joined to a channel receives a packet and pushes it to the audio driver. This causes it to be placed at the back of the buffer, potentially 0.8 seconds away from being played. Another receiver just joined the group and this is the first packet it receives. It pushes it to it's audio device, which has an empty buffer, and therefore it gets played almost immediately.

now this method does waste bandwidth, in theory we could do some minimal compression... but that may actually add some latency (think context switch to deal with remote, robs a cycle or two from the decode)... if compresion is added, we most likely would need the metronome.

If we add compression at this stage, why not just send the previously compressed MP3/FLAC/whatever in the first place? As soon as you add compression back into the mix, you have to deal with another buffer to hold the decompressed audio before it's sent to the audio device.

The method I'm currently looking at involves broadcasting the MP3 data from one Rio to the others and then (on another mutlicast channel) broadcasting a signal of the current frame that the "master" is playing. So essentially the slave Rios would hold onto decoded frames whose frame numbers haven't been sent to the multicast address yet and throw away any frames which have frame numbers less than the last received frame number.

I doubt there's a reason that your idea of decompressing at the server and then multicasting the decoded audio wouldn't work, however it seems inefficient to be pumping all that audio data onto the network. I'd be concerned that opening up an FTP session on the local network could interrupt the broadcast. And of course if you're not providing any real buffering at the client side other than the kernel buffer, even a small drop in network performance can cause an audio dropout.

As for the rio... the proposal would be to transmit 44.1k * 16bit * 2 channel (stereo) audio... which brakes down to 1.41mbps, so we should be able to fit 6 seperate stero streams onto a 10mb network easily... a 7th might fit.

7 * 1.4 = 9.8Mbps and that's just for the data, not counting protocol overhead. That will never happen. Even 6 streams (8.4Mbps) is unlikely, though it may work on an isolated network without much other activity. Remember 10Mbps ethernet will never actually reach 10Mbps due to collisions, protocol overheads, acks, etc. I don't think I've seen a 10Mbps segment do much better than 8.