0

The files are the exact same, I'm applying regex to string input on copies of 2 exact same file, but i am not sure how to do the comparison of the groups. Here is my code:

for res1 in result1:
            for res2 in result2:
                res2 = res2.group()
            if res1.group() != res2:
                print(res1.group())

trying to compare it like this only the last group matches, so I believe it has something to do with indentation. result1 is the regex result on the first file, result 2 is the regex result on second file. they're callable objects, so I'm looping through them and tryin to compare. There should be no difference. Would anyone have any tips? I am thinking about using isdiff instead, although I have never used it. Below is a bit more of the code.

else:
    read1 = open(fle + '.txt')
    result1 = re.finditer(regex, read1.read())

    
    with open(infile, "r") as x:
        result2 = re.finditer(regex, x.read())
        
        for res1 in result1:
            for res2 in result2:
                res2 = res2.group()
            if res1.group() != res2:

everything else in the script seems to be working as intended, at this point I am stuck comparing regex groupings, which in the current situation should be the same. I've recopied the file a few times and tested.

hfak
  • 15
  • 4

1 Answers1

0

As you have mentioned, the problem lies in indentation of the for-loop:

for res1 in result1:
  for res2 in result2:
    res2 = res2.group()
  if res1.group() != res2:
    print(res1.group())

What happens is that the inner-loop keeps overwriting the res2 variable until the loop finishes. At this point, the value of res2 is the group of the last match of regex. This, in turn, causes the comparison to check only the last match of res2, for each res match.

You can easily see this behavior in the loop below:

>>> for number in [1, 2, 3]:
...   number *= 2
... print(number)

6

In order for the comparison to check all pairs of results, move it inside the inner loop:

for res1 in result1:
  for res2 in result2:
    res2 = res2.group()
    if res1.group() != res2:
      print(res1.group())
Maciej Gol
  • 15,394
  • 4
  • 33
  • 51