6

I recently recovered a ton pictures from a friend's dead hard drive and I decided to wanted to write a program in python to:

Go through all the files

Check their MD5Sum

Check to see if the MD5Sum exists in a text file

If it does, let me know with "DUPLICATE HAS BEEN FOUND"

If it doesn't, add the MD5Sum to the text file.

The ultimate goal being to delete all duplicates. However, when I run this code, I get the following:

Traceback (most recent call last):
  File "C:\Users\godofgrunts\Documents\hasher.py", line 16, in <module>
    for line in myfile:
io.UnsupportedOperation: not readable

Am I doing this completely wrong or am I just misunderstanding something?

import hashlib
import os
import re

rootDir = 'H:\\recovered'
hasher = hashlib.md5()


with open('md5sums.txt', 'w') as myfile:
        for dirName, subdirList, fileList in os.walk(rootDir):            
                for fname in fileList:
                        with open((os.path.join(dirName, fname)), 'rb') as pic:
                                buf = pic.read()
                                hasher.update(buf)
                        md5 = str(hasher.hexdigest())
                        for line in myfile:
                                if re.search("\b{0}\b".format(md5),line):
                                        print("DUPLICATE HAS BEEN FOUND")
                                else:
                                        myfile.write(md5 +'\n')
godofgrunts
  • 262
  • 3
  • 12
  • About your indentation, four spaces is preferable to 8. Would be much easier for us all to read as well. See [PEP8](http://www.python.org/dev/peps/pep-0008/#indentation). – RyPeck Sep 26 '13 at 02:12

2 Answers2

4

You have opened your file in writing mode ('w') In your with statement. To open it both writing and reading mode, do:

with open('md5sums.txt', 'w+') as myfile:
TerryA
  • 58,805
  • 11
  • 114
  • 143
  • So I tried that and it still doesn't write. I added some print statements in my code and it goes down the part where I declare md5 = str(hasher.hexdigest()) and starts the os.walk for loop again. – godofgrunts Sep 26 '13 at 01:17
  • @godofgrunts Double check your indentation. It could be doing things you don't want to do – TerryA Sep 26 '13 at 01:19
  • I'm not going to say I'm 100% sure my indentation is right, but I'm pretty sure it is. – godofgrunts Sep 26 '13 at 01:26
  • Oh, I thought it's `rw` instead of `w+`? – justhalf Sep 26 '13 at 01:28
  • 'rw' complains with: Traceback (most recent call last): File "C:\Users\godofgrunts\Documents\hasher.py", line 9, in with open('md5sums.txt', 'rw') as myfile: ValueError: must have exactly one of create/read/write/append mode – godofgrunts Sep 26 '13 at 01:28
4

The correct mode is "r+", not "w+".

http://docs.python.org/3.3/tutorial/inputoutput.html#reading-and-writing-files

aenda
  • 141
  • 4