I'm using Microsoft Azure's video indexer API to process MP4 videos. Some of the videos are very similar (same shots, but perhaps a different voice-over). Ideally, I'd like to group the videos together in my output, which is a CSV file.
I'm using Python to concatenate the Video Indexer JSON output and convert it to CSV. Is there a way that I can use Python to compare how similar the JSON output for each file is?
Two sample JSON responses are below. Note that the second is missing the "Football" keyword, but everything else is the same as the first.
I'd like a way to quantify how similar these 2 sets of keywords are. So, if they were exactly the same, the similarity value would be 1.0. If they were completely different, the similarity value would be 0.0.
{
"accountId": "00000000000",
"id": "abc3454321",
"name": "Video A",
"description": "Test",
"userName": "Some name",
"created": "2018/2/2 18:00:00.000",
"privacyMode": "Private",
"state": "Processed",
"isOwned": true,
"isEditable": false,
"isBase": false,
"durationInSeconds": 120,
"summarizedInsights" : {
"keywords": [{
"id": 1,
"name": "4k"
}, {
"id": 2,
"name": "Television"
}, {
"id": 3,
"name": "Football"
}]
}
}
A second video would have slightly different summarizedInsights:
{
"accountId": "00000000000",
"id": "abc3454321",
"name": "Video B",
"description": "Test",
"userName": "Some name",
"created": "2018/2/2 18:00:00.000",
"privacyMode": "Private",
"state": "Processed",
"isOwned": true,
"isEditable": false,
"isBase": false,
"durationInSeconds": 120,
"summarizedInsights" : {
"keywords": [{
"id": 1,
"name": "4k"
}, {
"id": 2,
"name": "Television"
}]
}
}