Is two decoders a requirement?

I mean: if the ouput bit-stream is buffered for, let's say 4 seconds, than we've more or less 2 seconds of the current playing and queued for playing. Some other process should find the silent moment and does the cross-over.
Ok above is not wriiten very clearly, but I think I have seen above story earlier in an other discussion thread.

EiSl