1

I am trying to normalize complex nested json in python but I am unable to parse all the objects out.

I am referencing the code from this page. https://medium.com/@amirziai/flattening-json-objects-in-python-f5343c794b10

sample_object = {'Name':'John', 'Location':{'City':'Los Angeles','State':'CA'}, 'hobbies':['Music', 'Running']}

def flatten_json(y):
    out = {}

    def flatten(x, name=''):  

        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            for a in x:
                flatten(a, name)
        else:
            out[name[:-1]] = x

    flatten(y)

    return out
flat = flatten_json(sample_object)
print json_normalize(flat)

Return Result:

Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles   | CA             | Running

Expected Result:

Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles   | CA             | Running
John | Los Angeles   | CA             | Music
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
xyz
  • 11
  • 4

1 Answers1

1

The problem you are having originates in the following section

elif type(x) is list:
    for a in x:
        flatten(a, name)

Because you do not change the name for every element of the list, every next element will override the assignment of the previous element and thus only the last element will show in the output.

Applied to this example, when the flattening function reaches the list 'hobbies' it will first assign the name 'hobbies' to the element 'Music' and send it to the output. After the element 'Music', the next element in the list is 'Running', which will also be asigned the name 'hobbies'. When this element is send to the output it will notice that the name 'hobbies' already exists and it will override the value 'Music' with the value 'Running'.

To prevent this the script from the link you referenced uses the following piece of code to append de array's index to the name, thus creating a unique name for every element of the array.

elif type(x) is list:
    i = 0
    for a in x:
        flatten(a, name + str(i) + ' ')
        i += 1

This would create an extra 'columns' to the data however rather then a new row. If the latter is what you want you would have to change the way the functions is set up. One way could be to adapt the function to return an list of json's (one for each list element in the original json).

An extra note: I would recommend beeing a bit more carefull with coppying code when submitting a question. The indenting is a bit of in this case and since you left out the part where you import json_normalize it might not be completely clear for everyone that you are importing it from pandas

from pandas.io.json import json_normalize
TJ24
  • 33
  • 5