c# - Is there anything faster than regex for matching a whole word? -


edit:

my original question asking whether ever faster regex matching whole word. have added code, , have run several tests. details below

my sample matching string (from the old man , sea)

he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish. in first forty days boy had been him. after forty days without fish boy’s parents had told him old man , salao, worst form of unlucky, , boy had gone @ orders in boat caught 3 fish first week

here's regex

"(\b(cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop)(s?)\b)" 

here's first matching attempt without regex

public static words = "cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop";  public static bool matchbysplitting(string sentence) {     string[] sentence_words = sentence.split(',','.',' ',';','-');     string[] match_words = words.split('|');       foreach(string w in sentence_words)     {         foreach(string m in match_words)         {             if(m == w)                 return true;         }     }     return false; } 

running 5000 iterations of each:

  • regex matching: 250-300 ms
  • matchbysplitting: 250-350 ms, comparable time regex.

however, if shorten matching string first line, results change

he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish.

the regex stays same, matchbysplitting speeds lot:

  • regex matching: 220-260 ms
  • matchbysplitting: 50-150 ms - faster regex.

if start messing classics, , insert word will match

he old man fished alone in skiff in gulf stream , had gone eighty-four days without taking fish. on eighty fifth day, caught tuna. end

  • regex matching: 170-300 ms
  • matchbysplitting: 100-200 ms - faster regex.

i think i've answered own question here. custom matching method seems equal or faster regex in cases.

however, haven't covered word boundaries in code (!?) may slow down little if add in.

try making compiled regex, this:

static readonly regex cornregex = new regex("\b(corn)\b", regexoptions.compiled); 

this generate , compile method contains assembly instructions needed match regex. should fast, comparable writing own custom function loops on individual characters.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -