2

I'm stuck. I want to take a Windows directory that the user specifies, and list every file in that directory on a table with path, file name, file size, last modified time, and MD5 hash. For the life of me I can't figure out how to break it up in to individual files; it only does the entire path. I understand the path variable needs to be turned in to the various files within the directory, but I don't know how to do that.

How can I create the table accordingly, and add the MD5 hash columns. Last modified time should be a human readable format, not a UNIX timestamp.

#import libraries
import os
import time
import datetime
import logging
import hashlib
from prettytable import PrettyTable
import glob

#user input
path = input ("Please enter directory: ")
verbose = input ("Please enter yes/no for verbose: ")
print ("===============================================")

#processing input
if os.path.exists(path):
    print("Processing directory: ", (path))
else:
    print("Invalid directory.")
    exit()

if (verbose) == ("yes"):
    print("Verbose selected")
elif (verbose) == ("no"):
    print("Verbose not selected")
else:
    print("Invalid input")
print ("===============================================")

#process directory
directory = glob.glob(path)
filename = os.path.basename(path)
size = os.path.getsize(path)
modified = os.path.getmtime(path)

#output in to table
report = PrettyTable()

column_names = ['Path', 'File Name', 'File Size', 'Last Modified Time', 'MD5 Hash']
report.add_column(column_names[0], [directory])
report.add_column(column_names[1], [filename])
report.add_column(column_names[2], [size])   
report.add_column(column_names[3], [modified])
report.sortby = 'File Size'

print (report)
CDJB
  • 14,043
  • 5
  • 29
  • 55
Matt R
  • 21
  • 2

1 Answers1

2

Does this solution match your requirements? Using the builtin pathlib:

from pathlib import Path
from datetime import datetime
import hashlib

#...Your code getting path here...

directory = Path(path)
paths = []
filename = []
size = []
hashes = []
modified = []
files = list(directory.glob('**/*.*'))

for file in files:
    paths.append(file.parents[0])
    filename.append(file.parts[-1])
    size.append(file.stat().st_size)
    modified.append(datetime.fromtimestamp(file.stat().st_mtime))
    with open(file) as f:        
        hashes.append(hashlib.md5(f.read().encode()).hexdigest())

#output in to table
report = PrettyTable()

column_names = ['Path', 'File Name', 'File Size', 'Last Modified Time', 'MD5 Hash']
report.add_column(column_names[0], paths)
report.add_column(column_names[1], filename)
report.add_column(column_names[2], size)   
report.add_column(column_names[3], modified)
report.add_column(column_names[4], hashes)
report.sortby = 'File Size'

print(report)

Output:

+-------------------+------------------+-----------+----------------------------+----------------------------------+
|        Path       |    File Name     | File Size |     Last Modified Time     |             MD5 Hash             |
+-------------------+------------------+-----------+----------------------------+----------------------------------+
| C:\...\New folder | 1 - Copy (2).txt |     0     | 2019-12-05 15:35:31.562420 | d41d8cd98f00b204e9800998ecf8427e |
| C:\...\New folder | 1 - Copy (3).txt |     0     | 2019-12-05 15:35:31.562420 | d41d8cd98f00b204e9800998ecf8427e |
| C:\...\New folder |   1 - Copy.txt   |     0     | 2019-12-05 15:35:31.562420 | d41d8cd98f00b204e9800998ecf8427e |
| C:\...\New folder |      1.txt       |     0     | 2019-12-05 15:35:31.562420 | d41d8cd98f00b204e9800998ecf8427e |
+-------------------+------------------+-----------+----------------------------+----------------------------------+
CDJB
  • 14,043
  • 5
  • 29
  • 55
  • Yes! Thank you. So my tweaks need to be converting the file time from epoch to a standard MM/DD/YYYY TT:TT:TT, and adding the MD5 hash. Since I don't have a direct variable to process, what's the easiest way to do that. – Matt R Dec 05 '19 at 15:40
  • Yes, I did miss the brackets that were removed; sorry, I was too eager to get it going. – Matt R Dec 05 '19 at 15:41
  • I'm getting a NameError for hashes not being defined. – Matt R Dec 05 '19 at 16:11
  • Ah, I forgot to add `hashes=[]` at the start. I'll update my answer. – CDJB Dec 05 '19 at 16:12
  • It opens another file, cp1252.py, and throws a "builtins.UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 118: character maps to " error. – Matt R Dec 05 '19 at 16:59
  • This is related to something in your data which cannot be hashed by hashlib. You'd be better off asking this as a separate question. – CDJB Dec 05 '19 at 17:01
  • Odd. So it seems to work fine on text files, but throws that error with a couple of jpg's. Thank you again for the help! – Matt R Dec 05 '19 at 17:13