Find text that matches no regex pattern in python

Question

I have a bunch of moderately complicated patterns to match through a multi-GB log. I think between them all of the text matches some of the patterns, however I am curious if i'm missing something in the log.

How can I efficiently find text that didn't match any of the patterns I have?

I am working in python with re but can use whatever package.

Example:

pattern1 =  r'Error at get_next_question with payload: (\{.*\}) and message \'([0-9]*) not in the rule set\'. Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'

pattern2 =  r'Error at get_next_question with payload: (\{.*\}) and message All propositions evaluated to false, next question could not be selected.\nQuestion is: ([0-9]+)\nPropositions evaluated are: (\[\[.*\]\])\nAnswers available: (\[.*\]). Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'

how do I find text that is not either part of pattern1 or pattern2

We are not guaranteed that all errors are with get_next_question, or have other elements used in these queries.

Edit: further clarification. I am using patterns in a re.findall(pattern,log) approach, not re.match.

To the workflows is

list_of_type1_errors = re.findall(pattern1,log)
list_of_type2_errors = re.findall(pattern2,log)

I want to find if there is a type 3 error that I don't know about.

Is there a more efficient way to do the whole thing?

Can you share a sample of your code, so we can get the picture of how you're doing your matches, and give you an informed answer that'll work with your use case? — Green Cloak Guy, Aug 26 '20 at 01:54
Can you not compare the number of words in the file with the number of matches? — Abhijit Sarkar, Aug 26 '20 at 01:56
Abhijit Sarkar, yes, something i can definitely do. Green Cloak Guy, example added — Ilya, Aug 26 '20 at 01:57

score 0 · Answer 1 · answered Aug 26 '20 at 02:15

You can simply try all the patterns, and check if there are any that did not match:

patterns = [pattern1, pattern2, ...]
if not any(re.match(pattern, message) for pattern in patterns):
    # do thing

If you've already run the patterns beforehand, then you can cache whatever their results are, and use any() on that instead. That way, you're not running any given pattern more than once. But, of course, there's no way to know whether all the patterns fail unless you actually run all of them.

any() will short-circuit (stop executing) as soon as it finds a pattern that does match, however.

I added a clarification. I am doing a re.findall(pattern, log) . The log is very big and all of the pattern appear multiple times in the log. There are, however, parts of the log that might not match on any pattern. — Ilya, Aug 26 '20 at 21:57

Find text that matches no regex pattern in python

1 Answers1