Ah, I see your point.

Maybe DTivos can't do it then.
But a SA Tivo should be able to grab each 'digitised' frame from each picture (before its even encoded/recorded), then decimate [scale down smaller & throw away every 'n' pixels/lines] to make it a smaller image, then overlay this into the live 'output frame buffer' which is then output to the Video Out/Tuner Out on the SA Tivo.
Eveyr thing but the decimation and overlay is being done now by the SA Tivo, the only additional work it to decimate then copy (i.e. overlay) each decimated frame into the output frame buffer from the now-being played video stream.

This would give a simple PiP implementation but only because the SA hardware does the initial 'grab' of the live TV signal automatically on the fly now.

This would have a neat effect of allowing you to see the 'live' TV input with a delay of about 2 video frames (either 2/60 or 2/50 depending on TV standard) of the live signal arriving from the Tuner or Video in socket, while the normal 'encode/decode' process on Live-Tv takes about 2 seconds [i.e. normal 'live-TV' is really about 2 seconds behind reality].

Therefore you could see the instant live pictures overlayed into the delayed live TV signal (obviously if you paused Live-TV longer you would see the PiP window with live TV pictures with the playback signal delayed by the number of seconds/minutes from reality due to the live TV buffer on hard disk.

Like I said - a neat idea for live TV with lots of action you want to replay, while not missing the 'big picture' of whats happening now.