If I am copying a file and then comparing it back:
import shutil, filecmp
# dummy file names, they're not important
InFile = "d:\\Some\\Path\\File.ext"
CopyFile = "d:\\Some\\other\\Path\\File_Copy.ext"
# copy the file
shutil.copyfile(InFile,CopyFile)
# compare the two files
if not filecmp.cmp(InFile,CopyFile,shallow=False):
print "File not copied correctly"
Why? It seems kind of pointless doesn't it? After all I've just copied the file it has to be identical, doesn't it? wrong! Hard drives have an acceptable error rate that's very small but still present. The only way to be sure is to re-read the file but as it's just been in memory how can I be sure that the system (Windows 7) has actually read the file from the media and not just returned the page from standby memory?
Let's assume that I've got to write 16 TB of data to removable hard disc drives and I have to be sure that none of the files on the disc are corrupt - or at least no more corrupt than the live files. In that 16 TB of disc space there is likely to be a few files that are not identical; I am currently using WinDiff to check the files byte-for-byte but that file comparison utility is slow, but at least I can be reasonably sure that it's actually reading the file that was copied from the disc as the page should be long gone.
Can anybody offer an expert opinion, based on certainties, on which is likely to happen: read or remember?
It is suspicious that if I copy less than the installed memory the verification process is quicker than the copy - it should be, reading is quicker than writing, but not that quick. If I copy 3GB of files (I have 32 GB installed memory) and it takes a minute then verification should take 50 seconds or so and should be 100% disc use on resource monitor.. it's not, the verification takes less than 10 seconds and resource monitor doesn't budge. If I copy more than the installed memory then verification takes almost as long and the resource monitor shows 100% - what I'd expect! So what's happening here?
For reference, the real code with error checking removed:
import shutil, filecmp, os, sys
FromFolder = sys.argv[1]
ToFolder = sys.argv[2]
VerifyList = list()
VerifyToList = list()
BytesCopied = 0
if not os.path.exists(ToFolder):
os.mkdir(ToFolder)
for (path, dirs, files) in os.walk(FromFolder):
RelPath = path[len(FromFolder):len(path)]
OutPath = ToFolder + RelPath
if not os.path.exists(OutPath):
os.mkdir(OutPath)
for thisFile in files:
InFile = path + "\\" + thisFile
CopyFile = OutPath + "\\" + thisFile
ByteSize = os.path.getsize(InFile)
if ByteSize < 1024:
RepSize = "%d bytes" % ByteSize
elif ByteSize < 1048576:
RepSize = "%.1f KB" % (ByteSize / 1024)
elif ByteSize < 1073741824:
RepSize = "%.1f MB" % (ByteSize / 1048576)
else:
RepSize = "%.1f GB" % (ByteSize / 1073741824)
print "copy %s > %s " % (RepSize, thisFile)
VerifyList.append(InFile)
VerifyToList.append(CopyFile)
shutil.copyfile(InFile,CopyFile)
# finished copying, now verify
FileIndex = range(len(VerifyList))
reVerifyList = list()
reVerifyToList = list()
for thisIndex in FileIndex:
InFile = VerifyList[thisIndex]
CopyFile = VerifyToList[thisIndex]
thisFile = os.path.basename(InFile)
ByteSize = os.path.getsize(InFile)
if ByteSize < 1024:
RepSize = "%d bytes" % ByteSize
elif ByteSize < 1048576:
RepSize = "%.1f KB" % (ByteSize / 1024)
elif ByteSize < 1073741824:
RepSize = "%.1f MB" % (ByteSize / 1048576)
else:
RepSize = "%.1f GB" % (ByteSize / 1073741824)
print "Verify %s > %s" % (RepSize, thisFile)
if not filecmp.cmp(InFile,CopyFile,shallow=False):
#thisFile = os.path.basename(InFile)
print "File not copied correctly " + thisFile
# copy, second chance
reVerifyList.append(InFile)
reVerifyToList.append(CopyFile)
shutil.copyfile(InFile,CopyFile)
del VerifyList
del VerifyToList
if len(reVerifyList) > 0:
FileIndex = range(len(reVerifyList))
for thisIndex in FileIndex:
InFile = reVerifyList[thisIndex]
CopyFile = reVerifyToList[thisIndex]
if not filecmp.cmp(InFile,CopyFile,shallow=False):
thisFile = os.path.basename(InFile)
print "File failed 2nd chance " + thisFile