-1

I have a very simple task - I have a list of image and video files and I'll like to tabulate the creation date for each using the available EXIF data. I'm using pyexiftool for the actual data extraction.

I can pull the data out without a problem, but the resulting JSON output has a very strange shape. Each record has one field, but that field may contain 2 or 3 or multiple bits of information.

For example, some image files contain XMP:CreateDate and EXIF:CreateDate, whereas MOV files contain 'QuickTime:CreateDate' (I don't know what the fields would be for other file formats).

[{'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200422_085514.JPG', 'EXIF:CreateDate': '2020:04:22 08:55:14', 'XMP:CreateDate': '2020:04:22 08:55:14'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091856.JPG', 'EXIF:CreateDate': '2020:04:23 09:18:57'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091859.JPG', 'EXIF:CreateDate': '2020:04:23 09:19:00', 'XMP:CreateDate': '2020:04:23 09:19:00'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0004.mp4', 'QuickTime:CreateDate': '2017:03:11 13:05:59'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0005.mp4', 'QuickTime:CreateDate': '2017:03:11 13:08:26'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0006.mp4', 'QuickTime:CreateDate': '2017:03:11 13:09:17'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0035.mp4', 'QuickTime:CreateDate': '2017:03:12 14:08:55'}]

I am quite lost on how to parse this file and I can't loop through it as I would a regular JSON file. I only want to extract only a filename and creation datetime. I'd appreciate any advice.

Thanks.

EDIT The code that produces that 'JSON' output is this,

def old_main():
    dir_name = '/Users/Documents/Projects/ExifData/temp/'
    tags = ["File Name", "CreateDate"]
    log_file = 'py_log.txt'
    file_names = getListOfFiles(dir_name)
    with exiftool.ExifTool() as e:
        metadata = e.get_tags_batch(tags, file_names)
    with open(log_file, "w") as outfile:
        json.dump(metadata, outfile)

So what I've pasted is the direct output of the json.dump method. The get_tags_batch method is documented here.

Unless I've misunderstood the documentation for this package, it looks like the output is not JSON at all but rather just a string?

Appreciate the pointers and comments.

insomniac
  • 192
  • 1
  • 3
  • 16
  • This is not `JSON`. Note the single quotes, which would never be to used to define a string literal. – Booboo Apr 26 '20 at 17:46

2 Answers2

1

From looking at the snippet you posted, it is a list of dict. If the format is more complicated than that, please post a more complete example.

This is a simple way of iterating over each item and setting the date based on the first date field found.

results = []

for item in json_list:
    d = {'SourceFile': item['SourceFile']}
    date_keys = [k for k in item.keys() if 'CreateDate' in k]
    if date_keys:
        d['Date'] = item[date_keys[0]]
    else:
        d['Date'] = None
    results.append(d)
Eric Truett
  • 2,970
  • 1
  • 16
  • 21
  • If the OP is accurate in his description, the input is coming from a file, so it is a string not a dict. But it is *not* correct `JSON` format. – Booboo Apr 26 '20 at 17:45
  • Yes, I added some sample code on how I was generating the file. But I managed to figure out where I was going wrong. – insomniac Apr 26 '20 at 22:45
0

The reason why you are having trouble parsing this this "JSON" is because it is not JSON (note the use of single rather than double quotes). This cannot be parsed unmodified with a JSON parser.

Instead use:

from ast import literal_eval

t = """[{'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200422_085514.JPG', 'EXIF:CreateDate': '2020:04:22 08:55:14', 'XMP:CreateDate': '2020:04:22 08:55:14'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091856.JPG', 'EXIF:CreateDate': '2020:04:23 09:18:57'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091859.JPG', 'EXIF:CreateDate': '2020:04:23 09:19:00', 'XMP:CreateDate': '2020:04:23 09:19:00'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0004.mp4', 'QuickTime:CreateDate': '2017:03:11 13:05:59'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0005.mp4', 'QuickTime:CreateDate': '2017:03:11 13:08:26'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0006.mp4', 'QuickTime:CreateDate': '2017:03:11 13:09:17'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0035.mp4', 'QuickTime:CreateDate': '2017:03:12 14:08:55'}]"""
o = literal_eval(t)
print(o)

Prints:

[{'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200422_085514.JPG', 'EXIF:CreateDate': '2020:04:22 08:55:14', 'XMP:CreateDate': '2020:04:22 08:55:14'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091856.JPG', 'EXIF:CreateDate': '2020:04:23 09:18:57'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/IMG_20200423_091859.JPG', 'EXIF:CreateDate': '2020:04:23 09:19:00', 'XMP:CreateDate': '2020:04:23 09:19:00'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0004.mp4', 'QuickTime:CreateDate': '2017:03:11 13:05:59'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0005.mp4', 'QuickTime:CreateDate': '2017:03:11 13:08:26'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0006.mp4', 'QuickTime:CreateDate': '2017:03:11 13:09:17'}, {'SourceFile': '/Users/Documents/Projects/ExifData/temp/MOV_0035.mp4', 'QuickTime:CreateDate': '2017:03:12 14:08:55'}]

According to the manual, literal_eval:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None

Booboo
  • 38,656
  • 3
  • 37
  • 60