2

I have the following json object:

[{
    'firstname': 'Jimmie',
    'lastname': 'Barninger',
    'zip_code': 12345,
    'colors': ['2014-01-01', '2015-01-01'],
    'ids': {
        '44': 'OK',
        '51': 'OK'
    },
    'address': {
        'state': 'MI',
        'town': 'Dearborn'
    },
    'other': {
        'ids': {
            '1': 'OK',
            '103': 'OK'
        },
    } 
}, {
    'firstname': 'John',
    'lastname': 'Doe',
    'zip_code': 90027,
    'colors': None,
    'ids': {
        '91': 'OK',
        '103': 'OK'
    },
    'address': {
        'state': 'CA',
        'town': 'Los Angeles'
    },
    'other': {
        'ids': {
            '91': 'OK',
            '103': 'OK'
        },
    } 
}]

I would like to be able to get the number of unique key values that each dict has. In the above, the number would be:

address: 2 # ['state', 'town']
ids: 4 # ['44', '51', '91', '103']
other.ids 3 # ['1', '103', '91']

I've been having trouble iterating of the objects to figure this out, especially if there is an item within a list. What I've been trying thus far is something like the below, though it doesn't currently work I'm pasting it for reference:

def count_per_key(obj, _c=None):

    if _c is None: unique_values_per_key = {}

    if isinstance(obj, list):
        return [count_per_key(l) for l in obj]

    elif not isinstance(obj, dict):
        pass

    else:
        for key, value in obj.items():
            if not isinstance(value, dict):
                continue
            elif isinstance(value, dict):
                if key not in unique_values_per_key: unique_values_per_key[key] = set()
                unique_values_per_key[key].union(set(value.keys()))
                return count_per_key(value)
            elif isinstance(value, list):
                return [count_per_key(o) for o in value]

    return unique_values_per_key
Ajax1234
  • 69,937
  • 8
  • 61
  • 102

2 Answers2

0

You can use recursion with a generator:

from collections import defaultdict
d = [{'firstname': 'Jimmie', 'lastname': 'Barninger', 'zip_code': 12345, 'colors': ['2014-01-01', '2015-01-01'], 'ids': {'44': 'OK', '51': 'OK'}, 'address': {'state': 'MI', 'town': 'Dearborn'}, 'other': {'ids': {'1': 'OK', '103': 'OK'}}}, {'firstname': 'John', 'lastname': 'Doe', 'zip_code': 90027, 'colors': None, 'ids': {'91': 'OK', '103': 'OK'}, 'address': {'state': 'CA', 'town': 'Los Angeles'}, 'other': {'ids': {'91': 'OK', '103': 'OK'}}}]
def get_vals(d, _path = []):
  for a, b in getattr(d, 'items', lambda :{})():
    if a in {'ids', 'address'}:
       yield ['.'.join(_path+[a]), list(b.keys())]
    else:
       yield from get_vals(b, _path+[a])

c = defaultdict(list)
results = [i for b in d for i in get_vals(b)]
for a, b in results:
  c[a].extend(b)

_r = [[a, set(list(b))] for a, b in c.items()]
new_r = [[a, b, len(b)] for a, b in _r]

Output:

[
 ['ids', {'91', '44', '51', '103'}, 4], 
 ['address', {'state', 'town'}, 2], 
 ['other.ids', {'1', '91', '103'}, 3]
]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • @DavidL Please see my recent edit. Note that this solution only relies on the names of the most immediate values you are searching for, not the path to the values. – Ajax1234 Dec 30 '18 at 23:58
  • thanks for the update, but for this I don't know the keys beforehand. Is there a way to apply the above without knowing the field names? –  Dec 30 '18 at 23:59
  • @DavidL Do you mean that the solution should be able to take a list of desired names, not have them hardcoded? – Ajax1234 Dec 31 '18 at 00:01
  • would there be a way to do it without hardcoding the {'ids', 'address'} line in there? –  Dec 31 '18 at 00:36
  • @DavidL Would it be an acceptable solution to simply pass the set to the function? That way the set could be defined elsewhere. – Ajax1234 Dec 31 '18 at 00:47
0
l= [{'firstname': 'Jimmie', 'lastname': 'Barninger', 'zip_code': 12345, 'colors': ['2014-01-01', '2015-01-01'], 'ids': {'44': 'OK', '51': 'OK'}, 'address': {'state': 'MI', 'town': 'Dearborn'}, 'other': {'ids': {'1': 'OK', '103': 'OK'}}}, {'firstname': 'John', 'lastname': 'Doe', 'zip_code': 90027, 'colors': None, 'ids': {'91': 'OK', '103': 'OK'}, 'address': {'state': 'CA', 'town': 'Los Angeles'}, 'other': {'ids': {'91': 'OK', '103': 'OK'}}}]
def find_dicts(d,parent=''):
    for k,v in d.items():
        if isinstance(v,dict):
            if parent is not '':
                identifier=str(parent)+'.'+str(k)
            else:
                identifier=str(k)
            yield {identifier:[x for x in v.keys()]}
            yield from find_dicts(v,k)
        else:
            pass

s=[list(find_dicts(d)) for d in l]
dict_names=[list(y.keys())[0]  for y in s[0]]
final_dict={name:[] for name in dict_names}
for li in s:
    for di in li:
        di_key=list(di.keys())[0]
        di_values=list(di.values())[0]
        for k,v in final_dict.items():
            if k == di_key:
                for value in di_values:
                    if value not in final_dict[k]:
                        final_dict[k].append(value)
for k,v in final_dict.items():
    print(k,":",len(v),v)

Output

ids : 4 ['44', '51', '91', '103']
address : 2 ['town', 'state']
other.ids : 3 ['103', '1', '91']
other : 1 ['ids']
Bitto
  • 7,937
  • 1
  • 16
  • 38