Unoffical empeg BBS

Quick Links: Empeg FAQ | Software | RioCar.Org | Hijack | jEmplode | emphatic
Repairs: Repairs | Addons: Eutronix | Cases

Topic Options
#341304 - 18/01/2011 19:26 directory with zillions of files
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
Here's an oddball one.

I just got a giant Bluray in the mail with a large number of scanned ballots (TIFF files). So far as I can tell, they're all in one single directory, hundreds of thousands of them, and the total disk had 21GB of data on it. Tools like 'ls' and whatnot just don't work because they take too long to run. Just to make things more fun, there appear to be some hard errors at the end of the Bluray disk.

After much pain, 'dd conv=noerror' appeared to let me cleanly extract an ISO from the Bluray, so now I've got that on my hard drive and I'm now dealing with a local ISO image. Clearly, I need to extract files into a bunch of subdirectories.

The problem is getting to the point where I even know the damn file names. I've mounted the ISO, I go into the directory, and 'ls' again just sits and spins, completely consuming one core of my CPU, in kernel space. Surprisingly, even 'find . -print' is unhelpful. All it prints is "." before the kernel again gets slammed. After ten minutes, it hasn't printed a single file name.

So... any advice on how I can extract these files from the ISO image?

Top
#341305 - 18/01/2011 19:32 Re: directory with zillions of files [Re: DWallach]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4146
Loc: Cambridge, England
Recent libcdio comes with iso-info and iso-read tools that might help. Even if you end up needing to debug your image, at least using them means you'd be debugging in userland.

Peter

Top
#341306 - 18/01/2011 19:41 Re: directory with zillions of files [Re: peter]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
iso-read seems to insist that you know the filename you're trying to get out. It doesn't have a 'dump everything' option. Grumble. I really didn't want to be hacking libcdio just to get at my damn files.

Top
#341307 - 18/01/2011 19:41 Re: directory with zillions of files [Re: DWallach]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
ls on many platforms sorts all entries by default. Try ls -U, maybe?
_________________________
- Tony C
my empeg stuff

Top
#341308 - 18/01/2011 19:43 Re: directory with zillions of files [Re: peter]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
What OS you doing this on?

I assume the incredibly long wait for nothing is caused by the errors.

The UDF Tool package is unmaintained now and there is a complete lack of a fsck.udf tool as well. There was a UDF Verifier tool by Phillips Research but that seems to have vanished and it doesn't fix anything anyway.

Top
#341309 - 18/01/2011 19:44 Re: directory with zillions of files [Re: DWallach]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 13874
Loc: Canada
Instead of ls, try this: echo *

-ml

Top
#341310 - 18/01/2011 19:51 Re: directory with zillions of files [Re: mlord]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
This is on my iMac (Core 2 Duo, Mac OS X 10.6.5). I mounted the file with the default options (i.e., I just ran "open" on the ISO file). I'm trying "echo *" right now and it's just sitting there, again with one core slammed to 100% in the kernel, the other core idle.

Any thoughts on tooling to extract the files within?

Top
#341311 - 18/01/2011 19:52 Re: directory with zillions of files [Re: mlord]
Taym
pooh-bah

Registered: 18/06/2001
Posts: 2413
Loc: Roma, Italy
... or, you may use RAR command line, in any OS, without the need to mount the ISO, but by simply having RAR operate on the .ISO file itself, in the file system.

Edit:
I was assuming *NIX OS. I don't know if you have RAR on a Mac. If not, you may just copy the .ISO to some windows box of any type that you may have around.


Edited by taym (18/01/2011 19:55)
_________________________
= Taym =
MK2a #040103216 * 100Gb *All/Colors* Radio * 3.0a11 * Hijack = taympeg

Top
#341312 - 18/01/2011 19:57 Re: directory with zillions of files [Re: DWallach]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 13874
Loc: Canada
MMmm.. yes, I suppose "echo *" is really no improvement, since it wants to find them all before printing the line. Duh. smile

Best bet is a simple C program to break them up, using readdir() to walk the directory.

I can give you a basic readdir program that you can hack away at, if you like.

Top
#341313 - 18/01/2011 20:17 Re: directory with zillions of files [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 13874
Loc: Canada
Originally Posted By: mlord
I can give you a basic readdir program that you can hack away at, if you like.

This could be done more robustly, using execve() rather than system(), but it's probably good enough:

Code:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <dirent.h>
#include <string.h>
#include <errno.h>

static void do_something (const char *dpath, const char *this_name)
{
        char tmp[16384];  /* BIG dumb buffer, rather than trying to be clever */
        /*
         * Insert code here to do whatever with "this_name".
         * Here is an example of how to do something:
         */
        char destination[] = "/tmp";
        sprintf(tmp, "/bin/cp \"%s/%s\" \"%s\"", dpath, this_name, destination);
        printf("===> %s\n", tmp);
        fflush(stdout);
        if (-1 == system(tmp))
                perror(tmp);
}

int main (int argc, char *argv[])
{
        const char *dpath;
        struct dirent *de;
        DIR *dp;

        if (argc != 2) {
                fprintf(stderr, "Usage: %s <dir>\n", argv[0]);
                return 1;
        }
        dpath = argv[1];
        dp = opendir(dpath);
        if (dp == NULL) {
                perror(dpath);
                return 1;
        }
        errno = 0;
        while ((de = readdir(dp)) != NULL) {
                const char *this_name = de->d_name;
                if (strcmp(this_name, ".") && strcmp(this_name, "..")) {
                        printf("%s\n", this_name);
                        fflush(stdout);
                        do_something(dpath, this_name);
                }
                errno = 0;  /* for next readdir() call */
        }
        if (errno) {
                perror(dpath);
                return 1;
        }
        return 0;  /* all done */
}


Edited by mlord (18/01/2011 20:22)

Top
#341314 - 18/01/2011 20:18 Re: directory with zillions of files [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 13874
Loc: Canada
Actually, that's more or less how the find command does things..
I wonder if a plain old find . works?

Top
#341315 - 18/01/2011 20:20 Re: directory with zillions of files [Re: mlord]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
Based on your idea, I wrote my own C program to do the same basic thing. Now I've got all 223K filenames. Next task is to see what I want to do about them.

As I said up top, "find . -print" failed. I have no idea why.

Top
#341316 - 18/01/2011 20:23 Re: directory with zillions of files [Re: DWallach]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4146
Loc: Cambridge, England
Originally Posted By: DWallach
iso-read seems to insist that you know the filename you're trying to get out. It doesn't have a 'dump everything' option. Grumble. I really didn't want to be hacking libcdio just to get at my damn files.

iso-info with -f or -l to get the filenames, then iso-read to get them out.

Peter

Top
#341317 - 18/01/2011 20:27 Re: directory with zillions of files [Re: DWallach]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 13874
Loc: Canada
Cool.

Top
#341318 - 18/01/2011 20:33 Re: directory with zillions of files [Re: mlord]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
The extraction is running now. I'm modestly curious why "find" didn't work, but that's a question for another day.

Top
#341323 - 18/01/2011 21:39 Re: directory with zillions of files [Re: DWallach]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
"dmesg" might have provided some reasonably useful output. Also the other syslog-type logs. An "strace" or equivalent ("dtruss" in MacOS X 10.6) might have also been helpful.
_________________________
Bitt Faulk

Top
#341324 - 18/01/2011 22:25 Re: directory with zillions of files [Re: wfaulk]
tman
carpal tunnel

Registered: 24/12/2001
Posts: 5528
Who on earth made this disc anyway? They use some sort of packet writing software and just kept dumping files onto it?

Top
#341325 - 19/01/2011 01:02 Re: directory with zillions of files [Re: tman]
gbeer
carpal tunnel

Registered: 17/12/2000
Posts: 2659
Loc: Manteca, California
Originally Posted By: tman
Who on earth made this disc anyway? They use some sort of packet writing software and just kept dumping files onto it?


The first post said "ballots" and "tiff" so it's likely the take from a scanner type electronic voting machine.
_________________________
Glenn

Top
#341337 - 19/01/2011 14:39 Re: directory with zillions of files [Re: gbeer]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
Syslogs were not very helpful. They indicated I/O errors where there were unreadable bits of the disk. That's about it.

Who made this? A municipality that was providing me with copies of its ballots. I'm grateful for the data (now nicely segregated into hundreds of subdirectories). But what a pain to get at it.

For what it's worth, there's absolutely no standards document of any kind that indicates how somebody might give you digital scans of a million ballots.

Top
#341341 - 19/01/2011 18:15 Re: directory with zillions of files [Re: gbeer]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31160
Loc: Seattle, WA
Originally Posted By: gbeer
The first post said "ballots" and "tiff" so it's likely the take from a scanner type electronic voting machine.


Ah, I see that the security at Diebold is up to its usual high standards of quality.
_________________________
Tony Fabris

Top
#341342 - 19/01/2011 18:29 Re: directory with zillions of files [Re: tfabris]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
I'm honestly not sure what vendor this particular data set came from, but I'm pretty sure it's not Diebold. What's amazing is that the scans are one-bit (black and white) and fairly low resolution (you can't read most of the text printed on the ballot). This makes it difficult if you want to write a post-facto ballot scanner.

Top
#341356 - 20/01/2011 02:18 Re: directory with zillions of files [Re: DWallach]
gbeer
carpal tunnel

Registered: 17/12/2000
Posts: 2659
Loc: Manteca, California
Sounds like what you have been given is designed to meet the letter of the law.

That could make it tough if they were handing out randomized ballots, like when Schwarzenegger was first elected, and there were something like 50 names on the ballot for Governor.

Are you auditing these for your own entertainment or someone else?
_________________________
Glenn

Top
#341374 - 20/01/2011 16:39 Re: directory with zillions of files [Re: gbeer]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
I'm working with some attorneys who are in the midst of a public interest lawsuit. I can't really get into the details right now. Suffice to say that I may need to crunch my way through a whole lot of these ballots.

Top
#341380 - 20/01/2011 21:57 Re: directory with zillions of files [Re: DWallach]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3153
Loc: Portland, OR
Originally Posted By: DWallach
scans are one-bit (black and white) and fairly low resolution (you can't read most of the text printed on the ballot).

Can you post one?


Edited by canuckInOR (20/01/2011 21:58)
Edit Reason: fixed tag.

Top
#341398 - 21/01/2011 03:50 Re: directory with zillions of files [Re: canuckInOR]
DWallach
carpal tunnel

Registered: 30/04/2000
Posts: 3716
Not right now. Maybe later.

Top
#341416 - 21/01/2011 16:05 Re: directory with zillions of files [Re: DWallach]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3153
Loc: Portland, OR
No prob. I figured you might be under an NDA, or something, but was curious...

Top