SequenceMatcher: Recording no match just once?

Question

I am using SequenceMatcher to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison to another word fails.

names=[JOHN, LARRY, PETER, MARY]
files = [path or link]

  for file in files: 
     for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
             do something
        else:
             print name + 'not found'

I have also tried re.match and re.find and I encounter the same problem. The code above is a simple version of what I am doing. I'm new to Python too. Thank you very much!

Can you clarify your question a bit? What should the output be if a word is found more than once? And if only once? And if it is not found at all? — mac, Nov 21 '11 at 23:30
Yes. The output if a name is found is some information regarding that person that comes right after the name. Every person is mentioned only one time in a text, but not every person is in every text. If a person is not in a given text, I want to keep a record of that. The reason it is so important is because I am creating `csv ` file in which each column is a name. Does this help? Thanks! — Connie, Nov 22 '11 at 00:07

Dave · Answer 1 · 2011-11-21T23:39:57.980

0

The simple way would be to keep track of matched names and not print them if they've already been printed:

seen = {}
for file in files:
    for name in names:
        if SequenceMatcher(None, name, file).ratio() > .9:
            do something
        elif name not in seen:
            seen[name] = 0
            print name + 'not found'

edited Nov 21 '11 at 23:39

answered Nov 21 '11 at 23:30

Dave

3,834
2
29
44

This worked! Thank you. Though I placed `seen=[]` between the first `for` and the second `for` so that it resets for each file. – Connie Nov 22 '11 at 00:25

score 0 · Accepted Answer · answered Nov 22 '11 at 00:27

If I interpret your comment to the question correctly (but I am not 100% sure!), this might illustrate the general mechanism you can follow:

>>> text = 'If JOHN would be married to PETER, then MARY would probably be unhappy'
>>> names = ['JOHN', 'LARRY', 'PETER', 'MARY']
>>> [text.find(name) for name in names]
[3, -1, 28, 40]  #This list will be always long as the names list

What I mean by "mechanism you can follow" is that SequenceMatcher (that I substituted with the builtin method find) should not only work as a test [True|False] but should already output the information you want to store.

HTH!

SequenceMatcher: Recording no match just once?

2 Answers2