0

I am trying to compare the similarity between two text using a score. This is my code:

risk_list1_txt = []
scoreList = []
similarityDict = {}
theScore = 0
for text1 in risk_list1:
    similarityDict['FileName'] = text1
    theText1 = open(path1 + "\\" + text1).read().lower()
    for text2 in range(len(risk_list2)):
        theText2 = open(path2 + "\\" + risk_list2[text2]).read().lower()
        theScore = fuzz.token_set_ratio(theText1,theText2)
        similarityDict[risk_list2[text2]] = theScore
    outFile= open(fileDestDir,'w')
    outFile.write(str(theScore))
outFile.close()

the problem is that my outfile is only giving me the score for the last comparison, although I'm having 3 different textfiles in my risk_list1 and risk_list2. I cannot get this loop to function correctly.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
mehrblue
  • 35
  • 1
  • 4
  • Could you edit your post and make use of http://stackoverflow.com/editing-help to format your code with appropriate whitespace? – Brad Beattie May 22 '14 at 20:48
  • @BradBeattie though his code certainly doesn't follow PEP8 spacing, I formatted it to correctly be a code block. – Adam Smith May 22 '14 at 20:52
  • You don't use `text2` except as an index into `risk_list2`, so just iterate over `risk_list2` like you do `risk_list1`. Also, use `os.path.join` to join path components instead of manually concatenating them with `\\\`. – chepner May 22 '14 at 21:10

2 Answers2

6

You are opening the file in write mode and not append mode. Replace

outFile= open(fileDestDir,'w')

with

outFile= open(fileDestDir,'a')

The write mode truncates the contents of the file. Append mode appends to the existing content. More on file modes in the documentation here

shaktimaan
  • 11,962
  • 2
  • 29
  • 33
  • Is it not a problem that they are repeatedly opening the same file multiple times in the the outer loop? Wouldn't it make more sense to open it once outside the loop? Or am I reading this wrong? – SethMMorton May 22 '14 at 21:14
  • When working with files, you should use [the `with` statement](https://www.youtube.com/watch?v=lRaKmobSXF4&list=UUAuqj5Bs5mTTl1mIVDmuAlw0). – Gareth Latty May 22 '14 at 21:34
2

Looks like it may be an indentation issue.

for text1 in risk_list1:
    # iterates through each text1
    # ...

    for text2 in range(len(risk_list2)):
        # iterates through each text2
        theScore = fuzz.token_set_ratio(theText1,theText2)
        # theScore gets set

    # we've iterated all the way through the text2's

    outFile= open(fileDestDir,'w')
    outFile.write(str(theScore))
    # open and write!

Also as shaktimaan pointed out in his answer, anytime you open a file with the 'w' flag it blanks the file. Use 'a' to append to a file instead.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112