0

I have a bunch of moderately complicated patterns to match through a multi-GB log. I think between them all of the text matches some of the patterns, however I am curious if i'm missing something in the log.

How can I efficiently find text that didn't match any of the patterns I have?

I am working in python with re but can use whatever package.

Example:

pattern1 =  r'Error at get_next_question with payload: (\{.*\}) and message \'([0-9]*) not in the rule set\'. Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'

pattern2 =  r'Error at get_next_question with payload: (\{.*\}) and message All propositions evaluated to false, next question could not be selected.\nQuestion is: ([0-9]+)\nPropositions evaluated are: (\[\[.*\]\])\nAnswers available: (\[.*\]). Model info: \(\'(.*)\', \'(.*)\', \'(.*)\'\)'

how do I find text that is not either part of pattern1 or pattern2

We are not guaranteed that all errors are with get_next_question, or have other elements used in these queries.

Edit: further clarification. I am using patterns in a re.findall(pattern,log) approach, not re.match.

To the workflows is

list_of_type1_errors = re.findall(pattern1,log)
list_of_type2_errors = re.findall(pattern2,log)

I want to find if there is a type 3 error that I don't know about.

Is there a more efficient way to do the whole thing?

Ilya
  • 561
  • 2
  • 17

1 Answers1

0

You can simply try all the patterns, and check if there are any that did not match:

patterns = [pattern1, pattern2, ...]
if not any(re.match(pattern, message) for pattern in patterns):
    # do thing

If you've already run the patterns beforehand, then you can cache whatever their results are, and use any() on that instead. That way, you're not running any given pattern more than once. But, of course, there's no way to know whether all the patterns fail unless you actually run all of them.

any() will short-circuit (stop executing) as soon as it finds a pattern that does match, however.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • I added a clarification. I am doing a re.findall(pattern, log) . The log is very big and all of the pattern appear multiple times in the log. There are, however, parts of the log that might not match on any pattern. – Ilya Aug 26 '20 at 21:57