Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#166157 - 16/06/2003 18:23 Looking for a collate algorithm
cushman
veteran

Registered: 21/01/2002
Posts: 1380
Loc: Erie, CO
I've been searching since last night and couldn't come up with anything solid, but maybe someone here (that has had CompSci classes ) could point me in the right direction. What I am trying to do is this:

In the Palantir PDB Creator, I import all the tracks from the .csv file generated from Emplode/Jemplode. I then sort these tracks by the following values: Artist, Year, Source, Track, Title. I sort the text fields (currently) using a Collator object in Java, which uses the user's locale to determine accented characters' sort order. Then I generate the .pdb file which stores the Artist names in order along with all the tracks.

On the Palm device with Palantir, I am trying to implement a binary search through all the Artist records to scroll the Artist view to the right artist as a user enters characters in a search field. This was going well until I realized that the StrCaselessCompare function of the Palm will not compare strings with special characters in the same way as my java.text.Collator function.

I realized that I would have to write my own search function for the Palm. My question is: what's the best way to do this?

In Java, I can specify the sort order by creating a java.text.RuleBasedCollator, which takes as an argument a string representing a sort order for characters. This string would basically look something like this: a,A < b,B < c,C < d,D < eeE (with the middle e being an accented e). I could pass the RuleBasedCollator the sort order of the Palm device, but the Palm is annoying because it sorts spaces after other text characters. It would sort:

Alessandro Safini
Al Green
A Flock Of Seagulls

and I want it to sort:

A Flock Of Seagulls
Al Green
Alessandro Safini

I was thinking of making a hash table and weighting every character, but I'm not sure if there is an algorithm for sorting based upon a set of rules already existing and I don't know how to search for it. If there was, it would be easy for me to implement in both Java and Palm gcc.
_________________________
Mark Cushman

Top
#166158 - 16/06/2003 19:07 Re: Looking for a collate algorithm [Re: cushman]
Yang
addict

Registered: 14/01/2002
Posts: 443
Loc: Raleigh, NC
I don't program in Java so this would be a shot in the dark, but if spaces are a problem, what's preventing you from just removing them when you go to sort? Not sure how collators are used and if you have the chance to mangle the text before comparisson w/o actually changing the data.

Top
#166159 - 16/06/2003 19:48 Re: Looking for a collate algorithm [Re: Yang]
cushman
veteran

Registered: 21/01/2002
Posts: 1380
Loc: Erie, CO
I do have a chance to mangle the text before comparison, but I don't want to remove spaces because I want them sorted before all the rest of the entries. Desired sort example:

3 Doors Down
30 Seconds To Mars
311

If I ignored spaces, then it would sort as:

30 Seconds To Mars
311
3 Doors Down

I guess I could replace the space character with a control character or such that had an index value lower than the rest of the visible characters (Palm StrCaselessCompare compares primarily by index), but that would only handle the space characters. The accent characters all have indexes above the normal alphabet, making them sort incorrectly on the Palm. On the computer with my Java Collator object, it will sort accented characters like this:

Vengaboys
Véronique Sanson
Vertical Horizon

Where on the Palm, the order would be:

Vengaboys
Vertical Horizon
Véronique Sanson

Wait a minute though, basically the sort in Java ignores accents, maybe I could do the same. I could replace all commonly accented characters with their unaccented equivilants. Let me do some testing! Thanks for getting my brain going again, this may be simpler than I thought.
_________________________
Mark Cushman

Top
#166160 - 19/06/2003 11:20 Re: Looking for a collate algorithm [Re: cushman]
tms13
old hand

Registered: 30/07/2001
Posts: 1115
Loc: Lochcarron and Edinburgh
The Java sort doesn't ignore accents - it just considers them to be less significant than all the letters. Similarly, it doesn't ignore letter-case; that is compared only if everything else is the same.

It would be nice if the standard collators considered spaces equivalent - that's one thing that always annoys me in JEmplode. (Though it sometimes helps spot duplicates where one is mis-spaced, such as "Nightrider" appearing as "Night Rider" in some CDDB entry).
_________________________
Toby Speight
030103016 (80GB Mk2a, blue)
030102806 (0GB Mk2a, blue)

Top