1

so I have a Regex expression that I'm matching against a single string provided to check and match certain information. If it's matched, only the captured group is returned. I have made my function so that it removes any null strings from the returned array and finally, just gives the captured string as the output. This works great in unit testing for True Positives.

Now, I want to check for False Positives using the same expression, except I can't seem to figure out how to demonstrate it in a unit test. I have a few test strings in a file for which the Regex shouldn't match, and it does not. So my code works. But when I try to actually show that in a test case, as in check if a null string is returned, I can't.

I essentially want to check that if a match is not found, then it should return a null string. This is my code

match = re.findall(combined, narration)
    result = list(filter(None, match[0]))
    if match:
        return result[0]
    else:
        result[0] = ""
        return result[0]

The first clause works fine for matched strings and returns a single string as output. In the second clause, I want to output a null string so I can check with the test case .assertEqual if the string is unmatched. But the function returns list index out of range error.

Can anybody tell me if there's a better way to check for an unmatched string with Regex and Unit Tests?

Edit 1: Adding Expected Input and Output as requested

Input 1 - BRN CLG-CI IQ PAID ROHIT SINGH

Output 1 - ROHIT SINGH

Input 2 - BRN-TO CASH SELF

Output 2 - '' //null string

rick458
  • 97
  • 6
  • When `re.findall` does not find anything, it returns an empty list. – Wiktor Stribiżew Jun 03 '21 at 15:46
  • Check [How to return a string if a re.findall finds no match](https://stackoverflow.com/questions/56855558/how-to-return-a-string-if-a-re-findall-finds-no-match) – Wiktor Stribiżew Jun 03 '21 at 15:48
  • Yes I did see that post and I added the else clause after that itself. Even if I add the brackets for "" string, it shows the same error. – rick458 Jun 03 '21 at 15:50
  • Ok, I see you just need a single match from a string. You need no `re.findall`. Use `re.search`: `match = re.search(combined, narration)` then `if match: print(match.group(1)) else: print('No match')`, see https://ideone.com/l0VMcN – Wiktor Stribiżew Jun 03 '21 at 15:56
  • Yes I am matching but for multiple patterns, and so I'm extracting more than one word from a sentence too. So for my use case, I can't use the `re.search `. Can I implement the same technique with the `re.findall` method? – rick458 Jun 03 '21 at 16:01
  • It is the same, https://ideone.com/PHLUKj, just use `re.findall`. – Wiktor Stribiżew Jun 03 '21 at 16:03
  • Okay so I just tried your code. If Regex finds a match (true positive), it now returns both the extracted string and none. But if it does not find a match (true negative), it still returns the list index out of range error for some reason. – rick458 Jun 03 '21 at 16:13
  • No, [it does not return `None`](https://ideone.com/9dhskO). If there is no match, [it returns 'No match found'](https://ideone.com/PHLUKj). – Wiktor Stribiżew Jun 03 '21 at 16:15
  • Okay so I just got it working. I shifted the `result = list(filter(None, match[0])` inside the if clause and am then printing it if a match is found. If not, it returns 'none' or any string I want it to. So the final code looks like this ``` match = re.findall(combined, narration) if match: result = list(filter(None, match[0])) return result[0] else: return '' ```. Thanks a lot! – rick458 Jun 03 '21 at 16:20
  • But this only prints the first found value if there are more than one. – Wiktor Stribiżew Jun 03 '21 at 16:36
  • Please update the question with some sample input and expected output. – Wiktor Stribiżew Jun 03 '21 at 16:40
  • The function only returns at least one word if a match is found in a string. So it works for a sentence perfectly fine. Sure, I'll update the question with the expected outputs. – rick458 Jun 03 '21 at 16:50

1 Answers1

1

It seems you can use re.findall, check its output, and if there is a match, filter out empty matches and print the first match. Else, print an empty string.

See this Python demo:

import re
combined = r'^BRN.*?(?:paid|to)\s(?![A-Za-z\s]*\bself\b)([A-Za-z\s]+)'
narrations = ['BRN CLG-CI IQ PAID ROHIT SINGH','BRN-TO CASH SELF']

for narration in narrations:
    print('-------',narration,sep='\n')
    match = re.findall(combined, narration, flags=re.I)
    if match:
        result = list(filter(None, match))
        print( result[0] )
    else:
        print( '' )

yielding

-------
BRN CLG-CI IQ PAID ROHIT SINGH
ROHIT SINGH
-------
BRN-TO CASH SELF

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Yes, thank you! That definitely solves the problem, and it'd work if I need to return None in the else case too! – rick458 Jun 04 '21 at 08:22