I am relatively new to JSON
and Python
, and I am struggling to flatten JSON since last two days.
I read the example at http://pandas.pydata.org/pandas-docs/version/0.19/generated/pandas.io.json.json_normalize.html, but I didn't understand how to unlist some nested elements. I also read a few threads Flatten JSON based on an attribute - python How to normalize complex nested json in python? and https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. I tried all without any luck.
Here's the first record of my JSON file:
d =
{'city': {'url': 'link',
'name': ['San Francisco']},
'rank': 1,
'resident': [
{'link': ['bit.ly/0842/'], 'name': ['John A']},
{'link': ['bit.ly/5835/'], 'name': ['Tedd B']},
{'link': ['bit.ly/2011/'], 'name': ['Cobb C']},
{'link': ['bit.ly/0855/'], 'name': ['Jack N']},
{'link': ['bit.ly/1430/'], 'name': ['Jack K']},
{'link': ['bit.ly/3081/'], 'name': ['Edward']},
{'link': ['bit.ly/2001/'], 'name': ['Jack W']},
{'link': ['bit.ly/0020/'], 'name': ['Henry F']},
{'link': ['bit.ly/2137/'], 'name': ['Joseph S']},
{'link': ['bit.ly/3225/'], 'name': ['Ed B']},
{'link': ['bit.ly/3667/'], 'name': ['George Vvec']},
{'link': ['bit.ly/6434/'], 'name': ['Robert W']},
{'link': ['bit.ly/4036/'], 'name': ['Rudy B']},
{'link': ['bit.ly/6450/'], 'name': ['James K']},
{'link': ['bit.ly/5180/'], 'name': ['Billy N']},
{'link': ['bit.ly/7847/'], 'name': ['John S']}]
}
Here's the expected output:
city_url city_name rank resident_link resident_name
link San Francisco 1 'bit.ly/0842/' 'John A'
link San Francisco 1 'bit.ly/5835/' 'Tedd B'
link San Francisco 1 'bit.ly/2011/' 'Cobb C'
link San Francisco 1 'bit.ly/0855/' 'Jack N'
link San Francisco 1 'bit.ly/1430/' 'Jack K'
link San Francisco 1 'bit.ly/3081/' 'Edward'
link San Francisco 1 'bit.ly/2001/' 'Jack W'
link San Francisco 1 'bit.ly/0020/' 'Henry F'
link San Francisco 1 'bit.ly/2137/' 'Joseph S'
link San Francisco 1 'bit.ly/3225/' 'Ed B'
link San Francisco 1 'bit.ly/3667/' 'George Vvec'
link San Francisco 1 'bit.ly/6434/' 'Robert W'
link San Francisco 1 'bit.ly/4036/' 'Rudy B'
link San Francisco 1 'bit.ly/6450/' 'James K'
link San Francisco 1 'bit.ly/5180/' 'Billy N'
link San Francisco 1 'bit.ly/7847/' 'John S'
The flatten_json()
function (from Medium.com above) destroys the hierarchy. Here are first few rows:
{'city_url': 'link',
'city_name_0': 'San Francisco',
'rank': 1,
'resident_0_link_0': 'bit.ly/0842/',
'resident_0_name_0': 'John A', ...
Can someone please help me how to think about converting these datasets? Unfortunately, pandas
documentation provides no guidance for beginners. HEre's what I was playing with. Nothing worked.
from pandas.io.json import json_normalize
json_normalize(d,['city',['name','rank']])
json_normalize(d,['city','name','rank'])
json_normalize(d,['city','name'])
I'd appreciate if someone guide how to do these type of conversion and the thought process.
Also, I'm looking for a vectorized operation or O(N)
operation rather than O(N2)
because of the amount of data in the original dataset. Hence, anything slower than O(N)
won't work.