Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#322325 - 18/05/2009 00:42 Finding duplicate files
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I've been doing some cleanup on my ReadyNAS due to it running low on space. And for now, I have it back to a manageable size, but I know more can be done. Over the course of the years with various backup schemes, I know I've probably duplicated some files. So now I need an easy way of finding them. Anyone know of a good way to do this on an OS X box?

Ideally, I'd want to tell whatever program/script what paths to search, and I could have it look locally as well as on the NAS and inside disk images by having them all mounted prior to the check. It would be nice if the program just initially relied on filesize, then CRCs on files of the same size to sort out false dupes.

This is one of those programs that is hard to find, because Google is full of hundreds of unknown Windows utilities, even when you try to specify Unix or OS X only.

Top
#322326 - 18/05/2009 01:16 Re: Finding duplicate files [Re: drakino]
Attack
addict

Registered: 01/03/2002
Posts: 598
Loc: Florida
A quick Google search and I found two programs that find duplicate files. Tidy Up and DupeCheck. I would also recommend that you search
http://osx.iusethis.com/ to see if any other programs are available.

_________________________
Chad

Top
#322339 - 18/05/2009 11:56 Re: Finding duplicate files [Re: Attack]
Dignan
carpal tunnel

Registered: 08/03/2000
Posts: 12320
Loc: Sterling, VA
I also found Duplicate Files Searcher, which is about as straightforward as you could name a program.

Under Windows there's Doublekiller.

Also under Windows, if you're interested in weeding out duplicate photos, I've been using VisiPics, which is really cool. It compares the photos, and will find resizes and I believe crops as well.
_________________________
Matt

Top
#322346 - 18/05/2009 13:37 Re: Finding duplicate files [Re: Dignan]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Duplicate Files Searcher looks like it might work, I'll try it out later. DupeCheck is spotlight based so it won't work on the NAS, and Tidy Up is $30. I'll try free solutions first, but thanks for the pointer.

Windows is flat out for this, as I need to search inside several .dmg backups and also on my OS X desktop.

Also, thanks for the link to the iusethis.com site, never seen that before. Looks pretty handy for seeing what programs are popular.

Top
#322350 - 18/05/2009 14:22 Re: Finding duplicate files [Re: drakino]
Dignan
carpal tunnel

Registered: 08/03/2000
Posts: 12320
Loc: Sterling, VA
Yeah, I figured that Windows apps weren't very helpful to you, but if anyone else needs this I thought I'd include it just in case. And again, that Visipics app is great, even if you can't use it on Mac.


Edited by Dignan (18/05/2009 14:22)
_________________________
Matt

Top
#322352 - 18/05/2009 14:55 Re: Finding duplicate files [Re: drakino]
Attack
addict

Registered: 01/03/2002
Posts: 598
Loc: Florida
Originally Posted By: drakino
Duplicate Files Searcher looks like it might work, I'll try it out later. DupeCheck is spotlight based so it won't work on the NAS, and Tidy Up is $30. I'll try free solutions first, but thanks for the pointer.

Windows is flat out for this, as I need to search inside several .dmg backups and also on my OS X desktop.

Also, thanks for the link to the iusethis.com site, never seen that before. Looks pretty handy for seeing what programs are popular.


Use this tip to enable spotlight indexing on external volumes.
_________________________
Chad

Top
#322372 - 19/05/2009 00:20 Re: Finding duplicate files [Re: Attack]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Enabling Spotlight via the terminal hasn't done anything, lsof still snows no activity after waiting a bit. Leopard may have disabled the workaround people found in the tip.

Also, I'm weary of Duplicate File Searcher now, after having it open some web page when I ran it.

The search continues...

Top
#322388 - 19/05/2009 12:09 Re: Finding duplicate files [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Perhaps write a small bash/awk script to do the job?

Top
#322390 - 19/05/2009 13:02 Re: Finding duplicate files [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Originally Posted By: mlord
Perhaps write a small bash/awk script to do the job?


Like this, perhaps?
Code:
#!/bin/bash
#
# Find all duplicate files within the current directory and subdirectories.
# Requires /bin/bash, find, sort, and gawk.
#

find . -type f -ls | sort -n -k 7 | gawk '

function is_identical(fname1,fname2,fsize) {
        if (fsize == 0)
                return 1
        return ! system("diff -q \"" fname1 "\" \"" fname2 "\" >/dev/null 2>&1")
}

function finddups(line,this_fsize) {
        for (i = 0; i < 10; ++i)
                gsub("^[^ ]*  *","",line)
        this_fname = line

        if (fsize != this_fsize) {
                if (fcount > 1) {
                        for (f1 = 1; f1 < fcount; ++f1) {
                                for (f2 = f1 + 1; f2 <= fcount; ++f2) {
                                        fname1 = fnames[f1]
                                        fname2 = fnames[f2]
                                        if (is_identical(fname1,fname2,fsize))
                                                printf "%s == %s\n", fname1, fname2
                                }
                        }
                }
                fcount = 0;
        }
        fsize = this_fsize
        fnames[++fcount] = this_fname
}

END {
        finddups("",-1)
}

{
        finddups($0,$7)
}'


Top
#322428 - 20/05/2009 02:34 Re: Finding duplicate files [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
??

Top
#322437 - 20/05/2009 11:14 Re: Finding duplicate files [Re: mlord]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I let the script run overnight, and woke up to a 1.2GB text file and the script still running. Lots of incorrect dupes, so I'm going to need to track down why that happened.

My experience with awk is pretty limited, so this is a chance to learn it a bit more.

Top
#322438 - 20/05/2009 11:40 Re: Finding duplicate files [Re: drakino]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4174
Loc: Cambridge, England
Code:
find . -type f -printf '%20s %p\n' | sort -n | uniq -w20 -D | cut -c 22- \
 | xargs md5sum | sort | uniq -w32 -D

although you might need GNU uniq to get the -D option.

Peter

Top
#322440 - 20/05/2009 12:17 Re: Finding duplicate files [Re: peter]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14486
Loc: Canada
Ahh.. much more sensible. Thanks, Peter!

Top
#322444 - 20/05/2009 13:42 Re: Finding duplicate files [Re: peter]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Originally Posted By: peter
although you might need GNU uniq to get the -D option.

And the -w option as well.

Side tangent, anyone else finding Google more annoying to search with lately? Trying to find the man page for GNU uniq required me to use "uniq", because if I left it unquoted, Google just automatically assumed I wanted unique.

Top
#322454 - 20/05/2009 18:46 Re: Finding duplicate files [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Hm. Works without quotes for me.
_________________________
Bitt Faulk

Top