I have a bunch of moderately complicated patterns to match through a multi-GB log. I think between them all of the text matches some of the patterns, however I am curious if i'm missing something in the log.
How can I efficiently find text that didn't match any of the patterns I have?
I am working in python with re but can use whatever package.
Example:
pattern1 = r'Error at get_next_question with payload: (\{.*\}) and message \'([0-9]*) not in the rule set\'. Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'
pattern2 = r'Error at get_next_question with payload: (\{.*\}) and message All propositions evaluated to false, next question could not be selected.\nQuestion is: ([0-9]+)\nPropositions evaluated are: (\[\[.*\]\])\nAnswers available: (\[.*\]). Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'
how do I find text that is not either part of pattern1 or pattern2
We are not guaranteed that all errors are with get_next_question, or have other elements used in these queries.
Edit: further clarification. I am using patterns in a re.findall(pattern,log) approach, not re.match.
To the workflows is
list_of_type1_errors = re.findall(pattern1,log)
list_of_type2_errors = re.findall(pattern2,log)
I want to find if there is a type 3 error that I don't know about.
Is there a more efficient way to do the whole thing?