c# - Is there anything faster than regex for matching a whole word? -
edit:
my original question asking whether ever faster regex matching whole word. have added code, , have run several tests. details below
my sample matching string (from the old man , sea)
he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish. in first forty days boy had been him. after forty days without fish boy’s parents had told him old man , salao, worst form of unlucky, , boy had gone @ orders in boat caught 3 fish first week
here's regex
"(\b(cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop)(s?)\b)"
here's first matching attempt without regex
public static words = "cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop"; public static bool matchbysplitting(string sentence) { string[] sentence_words = sentence.split(',','.',' ',';','-'); string[] match_words = words.split('|'); foreach(string w in sentence_words) { foreach(string m in match_words) { if(m == w) return true; } } return false; }
running 5000 iterations of each:
- regex matching: 250-300 ms
- matchbysplitting: 250-350 ms, comparable time regex.
however, if shorten matching string first line, results change
he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish.
the regex stays same, matchbysplitting
speeds lot:
- regex matching: 220-260 ms
- matchbysplitting: 50-150 ms - faster regex.
if start messing classics, , insert word will match
he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish. on eighty fifth day, caught tuna. end
- regex matching: 170-300 ms
- matchbysplitting: 100-200 ms - faster regex.
i think i've answered own question here. custom matching method seems equal or faster regex in cases.
however, haven't covered word boundaries in code (!?) may slow down little if add in.
try making compiled regex, this:
static readonly regex cornregex = new regex("\b(corn)\b", regexoptions.compiled);
this generate , compile method contains assembly instructions needed match regex. should fast, comparable writing own custom function loops on individual characters.
Comments
Post a Comment