If I understand the posts correctly, the main issue is the time it takes to compute the checksums that tell us if the files are the same.

Would a stepwise method improve the time?:
- check if the filesize is different. if different, we know we need to reload the file
- if the filesize is the same, check the tags. if the tags are different, then we need to reload the file.
- if the tags are the same compute the CRC. if the CRC is different, then we need to reload the file.

if the normal operational scenario is that most of the new files one adds are NOT duplicates, then this might speed things up (you only have to compute CRCs on files that fail the filesize and tag tests).

-ted