I struggled with this yesterday afternoon and think I have come up with a clever solution but looking for feedback on how to improve it.
The scenario: I am running ffprobe on media files and getting back the JSON dictionary from ffprobe and storing it in a MongoDB collection linked to the mongo document for the file.
The problem: Some media file types give back key names in the JSON that are incompatible with the BSON documents in Mongo. For example, the following perfectly valid JSON cannot be stored in Mongo as is due to the keys in the tags dictionary:
"format": {
"filename": "ToS-4k-1920_CMO_freezeframe6308.mov",
"nb_streams": 2,
"nb_programs": 0,
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"format_long_name": "QuickTime / MOV",
"start_time": "0.000000",
"duration": "738.941667",
"size": "14542021084",
"bit_rate": "157436200",
"probe_score": 100,
"tags": {
"major_brand": "qt ",
"minor_version": "537199360",
"compatible_brands": "qt ",
"creation_time": "2018-01-15T18:07:07.000000Z",
"com.apple.quicktime.player.movie.audio.gain": "1.000000",
"com.apple.quicktime.player.movie.audio.treble": "0.000000",
"com.apple.quicktime.player.movie.audio.bass": "0.000000",
"com.apple.quicktime.player.movie.audio.balance": "0.000000",
"com.apple.quicktime.player.movie.audio.pitchshift": "0.000000",
"com.apple.quicktime.player.movie.audio.mute": "",
"com.apple.quicktime.player.movie.visual.brightness": "0.000000",
"com.apple.quicktime.player.movie.visual.color": "1.000000",
"com.apple.quicktime.player.movie.visual.tint": "0.000000",
"com.apple.quicktime.player.movie.visual.contrast": "1.000000",
"com.apple.quicktime.player.version": "7.6.6 (7.6.6)",
"com.apple.quicktime.version": "7.7.3 (2943.14) 0x7738000 (Mac OS X, 10.11.6, 15G18013)"
}
}
The solution? I wrote a recursive function to parse the dictionary updating the keys but it is bad mojo to update a dictionary you are iterating in so I tricked the system by getting a list of all the keys and interating through that so that I could update the keys from outside the dictionary. here is my function and how I called it. Feedback?
def key_string_replace(dictionary, findit, replaceit):
for k in list(dictionary.keys()):
if findit in k:
newkey = k.replace(findit, replaceit)
dictionary[newkey] = dictionary.pop(k)
k = newkey
else:
pass
if isinstance(dictionary[k], dict):
key_string_replace(dictionary[k], findit, replaceit)
elif isinstance(dictionary[k], list):
for l in dictionary[k]:
if isinstance(l, dict):
key_string_replace(l, findit, replaceit)
from subprocess import Popen, PIPE
cmd = "ffprobe -v quiet -print_format json -show_streams -show_format"
args = shlex.split(cmd)
args.append(pathToInputVideo)
# run the ffprobe process, decode stdout into utf-8 & convert to JSON
p = Popen (args, stdout=PIPE, stderr=PIPE)
output, error = p.communicate()
if p.returncode == 0:
ffprobeOutput = output.decode('utf-8')
ffprobeOutput = json.loads(ffprobeOutput)
# fix any bad keys in ffprobe json
key_string_replace(ffprobeOutput, '.', '_')