I have a script that I built following the blueprint in the book Learning Python for Forensics. The script will go through a directory specified by the user and collect the metadata for each file in the directory. The results are saved to a sqlite database and also written to a CSV or HTML file.
The script was originally written in Python 2.7.15. I am trying to update the code for Python 3.7. However, there is one line in the ingest directory function that is giving me issues.
The ingestDirectory
function looks like this:
def ingestDirectory(cur, source, custodian_id):
count = 0
for root, folders, files in os.walk(source):
for file_name in files:
meta_data = dict()
try:
meta_data['file_name'] = file_name
meta_data['file_path'] = os.path.join(root, file_name)
meta_data['extension'] = os.path.splitext(file_name)[-1]
file_stats = os.stat(meta_data['file_path'])
meta_data['mode'] = oct(file_stats.st_mode)
meta_data['inode'] = int(file_stats.st_ino)
meta_data['file_size'] = int(file_stats.st_size)
meta_data['atime'] = formatTimestamp(file_stats.st_atime)
meta_data['mtime'] = formatTimestamp(file_stats.st_mtime)
meta_data['ctime'] = formatTimestamp(file_stats.st_ctime)
except Exception as e:
logging.error('Could not gather data for file: ' + meta_data['file_path'] + e.__str__())
meta_data['custodian'] = custodian_id
columns = '","'.join(meta_data.keys())
values = '","'.join(str(x).encode('string_escape') for x in meta_data.values())
sql = 'INSERT INTO Files ("' + columns + '") VALUES ("' + values + '")'
cur.execute(sql)
count += 1
The line that is giving me errors is this:
values = '","'.join(str(x).encode('string_escape') for x in meta_data.values())
This line is meant to handle any string escape characters found in metadata.values
before writing the data to the database.
When I tried to run this code in Python 3 I got an error about an unrecognized codec. I did some research hear on Stack Overflow and found that string_escape
has been replaced with unicode-escape
in Python 3.
I am fairly new to Python 3 and Unicode. My question is this:
How do I update the line above so that it uses unicode-escape
instead of string_escape
and produces the same result as the Python 2.7 code?
Any help would be appreciated! I have been working on this for several days now and every solution I try just results in more error codes or corrupted output files.