Is this an existing feature?

Posted by: MHC

Is this an existing feature? - 24/07/2002 15:21

Is there a feature in either emplode or jemplode where you can select a directory (or directories) on your computer and have figure out the difference between what the empeg has on it and what's on your computer?

-Christian
Posted by: tfabris

Re: Is this an existing feature? - 24/07/2002 15:29

Not completely, but that is one thing people would like to have very much.

Click here for a more detailed explanation.
Posted by: mschrag

Re: Is this an existing feature? - 24/07/2002 16:27

I've been trying to decide how I want to do this ... It's a hard problem to optimize. Basically you have two sets of (potentially) thousands of files and you want to find differences... I suspect I will MD5sum (or something similar -- don't know if MD5 will be fast enough) the files on the way onto the player and go through the file system doing MD5sum's of the tunes you have there .... Maybe I can qualify with time stamp first or something.

I'm still workin on this one ...

ms
Posted by: tman

Re: Is this an existing feature? - 24/07/2002 17:13

On a 1.3Ghz Celeron it takes around 20 seconds to generate a MD5 hash for 226MB of files. Assuming a worse case of a full 120GB player than it will take at around 3 hours to compute the hashes on that. It's definately going to be an optional feature so people that don't have the time or processing power to turn off.

Maybe just doing a quick CRC on the first few k of the file and combining that with the file size would be sufficient?

On my player I've only got about 20 duplicates for files with the same size out of about 3000 tracks.

It's more risky doing this since the chance of a collision are significantly greater than doing a MD5 hash on the file.

- Trevor
Posted by: mschrag

Re: Is this an existing feature? - 24/07/2002 17:28

Thanks a lot for running the tests on this .. I always wondered but was too lazy to actually do it.

I was thinking too that you could keep a cache index on the PC that would store filename/modification date/hash so you could quickly check moddate to know if you need to recompute the hash, otherwise just use the one from the index file. On the Empeg itself it's pretty easy since you can write the hash into the tags and you know that it's read-only.

ms
Posted by: MHC

Re: Is this an existing feature? - 24/07/2002 17:34

I'd be happy with even a more generic solution. You can pretty much assume that all the songs on your player came from your computer. If the filename of an MP3 on your computer is different than what's on the Empeg, it's flagged as "you don't have this one". You don't have to worry about misspellings, because even if the filename is technically "misspelled", it's wrong on both the Empeg and the computer - so they match up.

And I'm surprised Tony didn't rake me over the coals for not checking the FAQ.

-Christian
Posted by: tman

Re: Is this an existing feature? - 24/07/2002 17:45

The way the player works at the moment means that you lsoe the filename once you upload it onto the empeg. Everything is renamed to be a hexadecimal number

We'd need to store the original filename in the tag for each file.

Does anybody ever move their music files around? If so then we need some other way of identifying files.

- Trevor
Posted by: tman

Re: Is this an existing feature? - 24/07/2002 17:51

If we have a cache file to store the precomputed hashes then the amount of time taken for each individual hash isn't so important. Also I don't know of many people that upload that much music in one go

It's only the initial hashing that is the time consuming part. Once that's done then it should in theory be reasonably quick.

One question though, how are you going to implement this for players that already have music on? You'd have to download the file, hash it, delete the downloaded copy and finally upload the new tag back to the player.

- Trevor
Posted by: tfabris

Re: Is this an existing feature? - 24/07/2002 18:19

And I'm surprised Tony didn't rake me over the coals for not checking the FAQ.

Gee, I thought I did.
Posted by: mschrag

Re: Is this an existing feature? - 24/07/2002 18:56

Yep ... That's how it had to work to upgrade tags from 1.0 to 2.0 players. It's teribbly slow.
Posted by: wfaulk

Re: Is this an existing feature? - 24/07/2002 19:43

The fastest checksum algorithm I can seem to find is Adler32 (defined within RFC1950), which is far from cryptographically secure, but should suffice for this task, if you were to decide to implement it.

I must add that it might be a good idea to only hash/checksum the MP3 portion of the file, as changes in ID3 tags might not necessitate reuploading the whole file. Maybe have a toggle or ask or have a way to just update the tags. Whatever.
Posted by: tman

Re: Is this an existing feature? - 25/07/2002 02:44

The only checksum that seems to be faster than Adler32 would be to just sum everything into one byte. But I'd say that wasn't a particularly good idea...

Good idea about only hashing the actual sound data instead of the entire file. If we really wanted to be flashy then we could implement something like the MoodLogic fingerprint technology and do our own version of CDDB/FreeDB

- Trevor
Posted by: image

Re: Is this an existing feature? - 28/07/2002 09:27

also, is it possible to ignore all id3 tags from whatever checksum implemented?