I would like to implement "word boundary" matches within a DFA based regular expression matcher. Can someone tell me how this is done?
To give some background, I'm currently using the "dk.brics.automaton" library, but it does not support assertions (e.g. \b
, word boundary). I need to use a DFA based engine because my main goal is actually determining equivalence of regular expressions, not doing the actual matching.
Additionally, the answer to the following question seems to indicate this is possible: DFA based regular expression matching - how to get all matches? by saying
"Again, we manage this by adding an epsilon transition with special instructions to the simulator. If the assertion passes, then the state pointer continues, otherwise it is discarded."
I can't quite figure out what this means, however. Is it suggesting that it can only be done with a special type of epsilon transition that looks at its endpoints and can only be traversed if its endpoint meet the assertion, or can it be done with "normal" epsilon transitions configured in some way? If I need these "special" type of epsilon transitions, then how can these be determinized (i.e. converted to a standard DFA)?
Pointers to any descriptions of how to actually implement this are greatly appreciated.