Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#107151 - 24/07/2002 15:21 Is this an existing feature?
MHC
journeyman

Registered: 14/02/2002
Posts: 66
Is there a feature in either emplode or jemplode where you can select a directory (or directories) on your computer and have figure out the difference between what the empeg has on it and what's on your computer?

-Christian

Top
#107152 - 24/07/2002 15:29 Re: Is this an existing feature? [Re: MHC]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31594
Loc: Seattle, WA
Not completely, but that is one thing people would like to have very much.

Click here for a more detailed explanation.
_________________________
Tony Fabris

Top
#107153 - 24/07/2002 16:27 Re: Is this an existing feature? [Re: tfabris]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
I've been trying to decide how I want to do this ... It's a hard problem to optimize. Basically you have two sets of (potentially) thousands of files and you want to find differences... I suspect I will MD5sum (or something similar -- don't know if MD5 will be fast enough) the files on the way onto the player and go through the file system doing MD5sum's of the tunes you have there .... Maybe I can qualify with time stamp first or something.

I'm still workin on this one ...

ms

Top
#107154 - 24/07/2002 17:13 Re: Is this an existing feature? [Re: mschrag]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
On a 1.3Ghz Celeron it takes around 20 seconds to generate a MD5 hash for 226MB of files. Assuming a worse case of a full 120GB player than it will take at around 3 hours to compute the hashes on that. It's definately going to be an optional feature so people that don't have the time or processing power to turn off.

Maybe just doing a quick CRC on the first few k of the file and combining that with the file size would be sufficient?

On my player I've only got about 20 duplicates for files with the same size out of about 3000 tracks.

It's more risky doing this since the chance of a collision are significantly greater than doing a MD5 hash on the file.

- Trevor

Top
#107155 - 24/07/2002 17:28 Re: Is this an existing feature? [Re: tman]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
Thanks a lot for running the tests on this .. I always wondered but was too lazy to actually do it.

I was thinking too that you could keep a cache index on the PC that would store filename/modification date/hash so you could quickly check moddate to know if you need to recompute the hash, otherwise just use the one from the index file. On the Empeg itself it's pretty easy since you can write the hash into the tags and you know that it's read-only.

ms

Top
#107156 - 24/07/2002 17:34 Re: Is this an existing feature? [Re: tman]
MHC
journeyman

Registered: 14/02/2002
Posts: 66
I'd be happy with even a more generic solution. You can pretty much assume that all the songs on your player came from your computer. If the filename of an MP3 on your computer is different than what's on the Empeg, it's flagged as "you don't have this one". You don't have to worry about misspellings, because even if the filename is technically "misspelled", it's wrong on both the Empeg and the computer - so they match up.

And I'm surprised Tony didn't rake me over the coals for not checking the FAQ.

-Christian

Top
#107157 - 24/07/2002 17:45 Re: Is this an existing feature? [Re: MHC]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
The way the player works at the moment means that you lsoe the filename once you upload it onto the empeg. Everything is renamed to be a hexadecimal number

We'd need to store the original filename in the tag for each file.

Does anybody ever move their music files around? If so then we need some other way of identifying files.

- Trevor

Top
#107158 - 24/07/2002 17:51 Re: Is this an existing feature? [Re: mschrag]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
If we have a cache file to store the precomputed hashes then the amount of time taken for each individual hash isn't so important. Also I don't know of many people that upload that much music in one go

It's only the initial hashing that is the time consuming part. Once that's done then it should in theory be reasonably quick.

One question though, how are you going to implement this for players that already have music on? You'd have to download the file, hash it, delete the downloaded copy and finally upload the new tag back to the player.

- Trevor

Top
#107159 - 24/07/2002 18:19 Re: Is this an existing feature? [Re: MHC]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31594
Loc: Seattle, WA
And I'm surprised Tony didn't rake me over the coals for not checking the FAQ.

Gee, I thought I did.
_________________________
Tony Fabris

Top
#107160 - 24/07/2002 18:56 Re: Is this an existing feature? [Re: tman]
mschrag
pooh-bah

Registered: 09/09/2000
Posts: 2303
Loc: Richmond, VA
Yep ... That's how it had to work to upgrade tags from 1.0 to 2.0 players. It's teribbly slow.

Top
#107161 - 24/07/2002 19:43 Re: Is this an existing feature? [Re: tman]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
The fastest checksum algorithm I can seem to find is Adler32 (defined within RFC1950), which is far from cryptographically secure, but should suffice for this task, if you were to decide to implement it.

I must add that it might be a good idea to only hash/checksum the MP3 portion of the file, as changes in ID3 tags might not necessitate reuploading the whole file. Maybe have a toggle or ask or have a way to just update the tags. Whatever.
_________________________
Bitt Faulk

Top
#107162 - 25/07/2002 02:44 Re: Is this an existing feature? [Re: wfaulk]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
The only checksum that seems to be faster than Adler32 would be to just sum everything into one byte. But I'd say that wasn't a particularly good idea...

Good idea about only hashing the actual sound data instead of the entire file. If we really wanted to be flashy then we could implement something like the MoodLogic fingerprint technology and do our own version of CDDB/FreeDB

- Trevor

Top
#107163 - 28/07/2002 09:27 Re: Is this an existing feature? [Re: tman]
image
old hand

Registered: 28/04/2002
Posts: 770
Loc: Los Angeles, CA
also, is it possible to ignore all id3 tags from whatever checksum implemented?

Top