Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#164069 - 04/06/2003 11:39 Anyone here any good with regular expressions?
AndrewT
old hand

Registered: 16/02/2002
Posts: 867
Loc: Oxford, UK
I'm struggling to get this expression to work the way I want. It works ok apart from the last bit (highlighted) where I'm trying to get it to match on 2 ' characters. Instead of matching on 2 's it matches on just 1.

^ [ a - z A - Z 0 - 9 \ s , / \ - _ @ # : ( ' ) { 2 } ] + $

(I've added a space between each character so it displays ok here, on my display the hash overlaid the colon making it look like a period).

Does anyone have any suggestions please?

Top
#164070 - 04/06/2003 11:45 Re: Anyone here any good with regular expressions? [Re: AndrewT]
mcomb
pooh-bah

Registered: 31/08/1999
Posts: 1649
Loc: San Carlos, CA
Why not just use two single quotes?

^ [ a - z A - Z 0 - 9 \ s , / \ - _ @ # : ( ' ' ) ] + $

EDIT: Nevermind, just payed a little attention and realized that was a character class. What are you trying to do anyways, match non-printable characters and single quotes? That regex excludes almost everything else. There may be an easier way.

-Mike


Edited by mcomb (04/06/2003 11:50)
_________________________
EmpMenuX - ext3 filesystem - Empeg iTunes integration

Top
#164071 - 04/06/2003 11:46 Re: Anyone here any good with regular expressions? [Re: AndrewT]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Depending on what's interpreting this (probably), you won't be able to use {} within a [].

you probably want to instead do:
^([\w\s,/\-@#]|'')+$
but then I'm not sure how it'll deal with more than 2 's in a row, nor what you would want it to do.

Notice I've replaced the a-zA-Z0-9_ with \w.

More notes: I'm pretty sure parentheses don't work as parentheses inside []; they just match those characters. Therefore, you might see if your current regex will only match two close parens in a row.


Edited by wfaulk (04/06/2003 11:48)
_________________________
Bitt Faulk

Top
#164072 - 04/06/2003 13:25 Re: Anyone here any good with regular expressions? [Re: wfaulk]
AndrewT
old hand

Registered: 16/02/2002
Posts: 867
Loc: Oxford, UK
Bitt: Thanks, your suggestion works exactly as I want. I have added a few other characters I need matching too so now it looks like this: ^([\w\s,/\-\.\+:@#]|'')+$

nor what you would want it to do
All I am trying to do is clean up some data before it's imported into a Sql database. I have spent (wasted to be truthful) most of the afternoon skimming regex stuff on the web trying to get this done because I was in a hurry. Less haste, more speed is a phrase that comes to mind

The data I am handling uses a pair of ' characters in place of a double quote character, I needed to allow a pair through but not an odd number of them.

Mike: As per your edit that didn't work for me either. Thanks too for your help though.

Top
#164073 - 06/06/2003 06:22 Regular Expressions Part 2 [Re: AndrewT]
AndrewT
old hand

Registered: 16/02/2002
Posts: 867
Loc: Oxford, UK
I'm still struggling with these damn things!

What I am trying to do with regex is cleanup strings by deleting unwanted characters (replace with "<nothing>"). The previous expressions were good at specifiying what IS allowed but I now essentially need to do the opposite and express what IS NOT allowed in order to match and replace those characters.

I'm now stuck again with replacing single occurrences of ' while preserving double occurrences. The following (simplified) expression correctly matches % and & but also matches every ' which is unwanted:

([%&]|('){1})

Any suggestions please?

To be truthful, I'd far sooner express everything that IS allowed and then somehow flip the expression's meaning.

Top
#164074 - 06/06/2003 10:11 Re: Regular Expressions Part 2 [Re: AndrewT]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Within a character class (the [] stuff), if you put a caret (^) as the first character, it inverts its meaning. But then you have to deal with your doublequotes. It might be best to deal with them separately somehow.

Here's the last one you said worked correctly backwards:
^([\w\s,/\-\.\+:@#]|'')+$
Let's see. You want to allow letters, numbers, underscores, whitespace, slashes, backslashes, hyphens, periods, plusses, colons, at-signs, hashes, and two-character doublequotes, but nix everything else.

How about
s/"//g

s/''/"/g
s/[^\w\s,/\-\.\+:@#"]//g
s/"/''/g
That'll convert all of your double quotes to doublequotes, kill everything you don't want except doublequote characters, then convert doublequotes back into double quotes. The first line is to get rid of any doublequote characters first so that they don't get mistakenly translated to double quotes.

There are probably other ways to do it, but that seems the most readable.
_________________________
Bitt Faulk

Top
#164075 - 06/06/2003 22:13 Re: Regular Expressions Part 2 [Re: AndrewT]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
First, it sounds like your strings are in the form of

<single><single>pretty much anything<single><single>.

So first of all, why not strip off the leading and trailing <single><single>, do your string munging, then put them back? Do you have embedded/escaped <single><single> to worry about, as well?

Second, are you doing this in perl? If so, you could probably make use of perl's extended regular expressions. Second, I'd say it's easier to do two passes, rather than trying to combine everything into one go.

My solution (using perl) would look like this:


$s =~ s/^''(.*)''$/$1/;
$s =~ s/[%&]//g;
$s =~ s/(?<!')'(?!')//g;
$s = qq(''$s'');


The (?<!') is a negative look behind for a single quote, and the (?!') is a negative look ahead for a single quote, so it matches single quotes that are not led by, nor followed by, a single quote. However, if the single quote is at the beginning of your string, following the two initial single quotes, then you're stuck (which is part of the reason I say strip the leading/trailing quotes off).

If you feed a string like so:


$s = q(''This is ' a string with it's single and double''quotes'');
$s =~ s/(?<!')'(?!')//g;
print qq($s\n);


You get the following output:

''This is a string with its single and double''quotes''


Perhaps we could give a better solution if you gave us some examples of strings you're trying to filter, and the result that they should look like.

Hope that helps.

Top