1

i have a list of paths to json files.

files = ['/Users/sbm/Downloads/ds214mb/sub-EESS001/sub-EESS001_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS002/func/sub-EESS002_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS003/sub-EESS003_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS004/func/sub-EESS004_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS005/sub-EESS005_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS006/sub-EESS006_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS007/func/sub-EESS007_task-Cyberball_bold.json',
 '/Users/sbm/Downloads/ds214mb/sub-EESS008/func/sub-EESS008_task-Cyberball_bold.json']

Now i intend to read all these files into dictionaries with same name as filename or diff name. And then iterate through those dict to find common key value pair.

I did the following to read all json files to diff dict. Now what would be an efficient way to compare all these dict to find common key: value pair?

import json
for file in range(0, len(files)):
    globals()['json%s' % file] = "Hello"

i = 0
for file in files:
    globals()['json%s' % i] = json.loads(open(file).read())
    i = i+1

sample json file looks like:

{
 'Manufacturer': 'Siemens',
 'ManufacturerModelName': 'Magnetom Verio',
 'RepetitionTime': 1.56,
 'SliceTiming': [0.0,
  0.78,
  0.06,
  0.84,
  0.12],
 'TaskName': 'Cyberball'}
  • 1
    if you could organize the dicts into a list, check this other answer out: http://stackoverflow.com/questions/9906944/python-find-only-common-key-value-pairs-of-several-dicts-dict-intersection – stackunderflow Mar 02 '17 at 20:58
  • Look here maybe http://stackoverflow.com/questions/25851183/how-to-compare-two-json-objects-with-the-same-elements-in-a-different-order-equa – oshaiken Mar 02 '17 at 21:31

2 Answers2

1

Interesting question....

I start with piping a list of JSON Files ....

find <dir> | grep json$ 

That pipe gets sent to a python program....

So this now looks like

find <dir> | grep json$ | python t.py

The python code does the following

  1. Opens the file
  2. Reads the file
  3. JSON Parses into Python Dictionary
  4. Outputs the python Dictionary

So this looks like this (Python3 code)

import json,sys,pprint
for file in sys.stdin:
  file=file.strip('\n')
  with open(file,"rt") as ifp:
    b=ifp.read()
    b=(b.replace('\n','')).replace("'","\"")
  ifp.close()
  c=json.loads(b)
  for k,v in c.items():
    print('{}:{}'.format(k,v))

We now sort and count the output using bash... which generically looks like this...

sort | uniq -c | sort -n  

So putting all this together we get ... (I am assuming all the JSON in same directory as I am at the moment)

ls *.json | python t.py | sort  | uniq -c  | sort -n

If you want the top 5 - it becomes

ls *.json | python t.py | sort  | uniq -c  | sort -n | head -n 5
Tim Seed
  • 5,119
  • 2
  • 30
  • 26
1

Only in python - no linux

files=['data1.json','data2.json','data3.json']
master_key_plus_value={}
import json,sys,pprint
for file in files:
  with open(file,"rt") as ifp:
    b=ifp.read()
    b=(b.replace('\n','')).replace("'","\"")
  ifp.close()
  c=json.loads(b)
  for k,v in c.items():
    if str(k)+': '+str(v) in master_key_plus_value:
        master_key_plus_value[str(k)+': '+str(v)] += 1
    else:
        master_key_plus_value[str(k)+': '+str(v)] = 1

#Now we have ready all the key + values into a single dictionary
#Sort by the value (occurance)



master_key

sorted_dictionary = sorted(master_key_plus_value.items(), key=lambda x: -x[1])

print("Most Common Key-Value is  {} Occurance {} ".format(sorted_dictionary[0][0],sorted_dictionary[0][1]))

Same principles for each file read JSON file as text reformat and make Json object which gives a python dictionary combine key + value and compare to a master dictionary add 1 to value if there else store and set value to 1 Finally Sort on value descending print top element ([0]) which is a tuple hence it is [0][0] and [0][1]

Tim Seed
  • 5,119
  • 2
  • 30
  • 26