Yesterday, I managed to more than double the responsiveness of my GeoCaching WAP portal, by making a very simple change to how it was using regular expressions to parse the cache pages.

Previously, several AWK statements resembled this:
gsub(".*images/WptTypes/","",text)
On my 600Mhz server, for a 10KByte text value, this could sometimes take 15-25 seconds to execute under gawk. The simple change was to anchor the regex pattern, by stuffing a caret symbol at the beginning:
gsub("^.*images/WptTypes/","",text)
This change renders execution time to nearly instantaneous, the way it should work, without any change whatsoever to the meaning of the original regular expression used. I wonder if the gawk folks might be willing to add this to their regex parser:
if (substr(pattern,1,2) == ".*") { pattern = "^"pattern };
Or perhaps even this:
sub("^[.][*]","^&",pattern)


Cheers


Edited by mlord (15/06/2004 10:44)