I'm looking for an efficient algorithm able to find all patterns that match a specific string. The pattern set can be very large (more than 100,000) and dynamic (patterns added or removed at anytime). Patterns are not necessarily standard regexp, they can be a subset of regexp or something similar to shell pattern (ie: file-*.txt
). A solution for a subset of regex is preferred (as explained below).
FYI: I'm not interested by brute force approaches based on a list of RegExp.
By simple regexp, I mean a regular expression that supports ?
, *
, +
, character classes [a-z]
and possibly the logical operator |
.
To clarify my need: I wish find all patterns that match the URL:
http://site1.com/12345/topic/news/index.html
The response should be these patterns based on the pattern set below.
http://*.site1.com/*/topic/*
http://*.site1.com/*
http://*
Pattern set:
http://*.site1.com/*/topic/*
http://*.site1.com/*/article/*
http://*.site1.com/*
http://*.site2.com/topic/*
http://*.site2.com/article/*
http://*.site2.com/*
http://*