For a simple problem of array length 5 to start with ( in practice the array length might be 20.. )
I have got a predefined set of patterns, like AAAAB, AAABA, BAABC, BCAAA, .... Each pattern is of the same length of the input array. I would need a function that takes any integer array as input, and returns all the patterns it matches. (an array may match a few patterns) as fast as possible.
"A" means that in the pattern all numbers at the positions of A are equal. E.g. AAAAA simply means all numbers are equal, {1, 1, 1, 1, 1} matches AAAAA.
"B" means the number at the positions B are not equal to the number at the position of A. (i.e. a wildcard for a number which is not A)Numbers represented by B don't have to be equal. E.g. ABBAA means the 1st, 4th, 5th numbers are equal to, say x, and 2nd, 3rd are not equal to x. {2, 3, 4, 2, 2} matches ABBAA.
"C" means this position can be any number (i.e. a wildcard for a number). {1, 2, 3, 5, 1} matches ACBBA, {1, 1, 3, 5, 1} also matches ACBBA
I am looking for an efficient ( in terms of comparisons number) algorithm. It doesn't have to be optimal, but shouldn't be too bad from optimal. I feel it is sort-of like the decision tree...
A very straightforward but inefficient way is like the following:
Try to match each pattern against the input. say AABCA against {a, b, c, d, e}. It checks if
(a=b=e && a!=c)
.If the number of patterns is n, the length of the pattern/array is m, then the complexity is about O(n*m)
Update:
Please feel free to suggest better wordings for the question, as I don't know how to make the question simple to understand without confusions.
An ideal algorithm would need some kind of preparation, like to transform the set of patterns into a decision tree. So that the complexities after preprocessing can be achieved to something like O(log n * log m) for some special pattern sets.(just a guess)
Some figures that maybe helpful: the predefined pattern sets is roughly of the size of 30. The number of input arrays to match with is about 10 millions.
Say, if AAAAA and AAAAC are both in the pre defined pattern set. Then if AAAAA matches, AAAAC matches as well. I am looking for an algorithm which could recognize that.
Update 2
@Gareth Rees 's answer gives a O(n) solution, but under assumption that there are not many "C"s. (otherwise the storage is huge and many unnecessary comparisons)
I would also welcome any ideas on how to deal with situations where there are many "C"s, say, for input array of length 20, there are at least 10 "C"s for each predefined patterns.