I tried this myself. I set up a multicast group and then when the Rio was about to send the audio to the sound card it would just write that same data out on a multicast address.

Have you tried having the server do the decode, and basically treating every rio as a slave?

My "guess" is the CPU may be the limiting factor... though again, I've done ZERO analysis of this on the Rio yet, but think about the amount of context switches alone (and again maybe there is a software optimization I don't know about...)

1) Recive packet (kernel)
2) push packet to decoder (kernel->userspace) + some delta of CPU time
3) push decode to hardware (audio) (user->kernel)
4) push decode to hardware (ether) (user->kernel)

Also take into account, that you actually would have to insert delay into the playback of the decode on the master in your scenario... as he actually know's what to play, before any other receeiver ever gets the decoded packet, obviously complicating sync.

Based on my limited (no hands on knowledge) of cobranet, you can actually calculate the latency between encoding audio and transmiting it to all hosts... assuming you have a hub based net for just the rio's the latency should be identical for all units... therefore they should actually inhernetly keep sync (assuming they have to do no decode)

So we are looking at...
1) Server decodes MP3/OGG/FLAC/etc to raw audio
2) transmit raw audio to multicast/broadcast
All latency in decode/buffer has already occured now
3) rio recives packet (kernel)
4) push to userspace to push to hardware (can this be skiped... ie DMA or some other kernel space move direct to audio hardware?)
5) push to hardware (audio)

so in theory, as long as every frame is enough to reproduce sound, any rio could join/leave a channel and be exactly in sync

I don't think I've missed anything, but worse comes to worse, we might need a "metronome" multicast heartbeat

now this method does waste bandwidth, in theory we could do some minimal compression... but that may actually add some latency (think context switch to deal with remote, robs a cycle or two from the decode)... if compresion is added, we most likely would need the metronome.

Unless my math is wrong, 48000 samples/second * 20 bits/sample * 2 channels (I assume these are stereo streams) * 64 streams = 122Mbps.

nope, each channel is that... a channel... so it's only 61Mbps for 64 channels... but remember you can go full duplex if you go switched, and in a switched enviornment you'd also be able to optimise what channels are where on the net.

As for the rio... the proposal would be to transmit 44.1k * 16bit * 2 channel (stereo) audio... which brakes down to 1.41mbps, so we should be able to fit 6 seperate stero streams onto a 10mb network easily... a 7th might fit.

Is the rio ethernet 10mb or 100mb?

I'm assuming the best you can do on the phoneline net is 10mb?