0

I've been struggled for hours to use the explode on multiple attributes with different lenghts.

Here's the sample data - json response

main = {
    'clothes': [{
            'id': 13,
            'color': 'red',
            'contributor_info': [],
            'commentor_info': []
        },
        {
            'id': 15,
            'color': 'yellow',
            'contributor_info': [{
                    'id': 35,
                    'name': 'Anthony',
                    'categories': ['Social'],
                    'first_seen_date': '2020-07-29',
                    'last_seen_date': '2020-07-29'
                },
                {
                    'id': 67,
                    'name': 'Betsy',
                    'categories': ['Networking'],
                    'first_seen_date': '2020-08-31',
                    'last_seen_date': '2020-09-29'
                }
            ],
            'commentor_info': [{
                'id': 3822,
                'name': 'Future',
                'categories': ['Work'],
                'first_seen_date': '2020-09-29',
                'last_seen_date': '2020-09-29'
            }]
        },
        {
            'id': 17,
            'color': 'blue',
            'contributor_info': [{
                    'id': 468,
                    'name': 'Cassendra',
                    'categories': ['Social'],
                    'first_seen_date': '2020-07-29',
                    'last_seen_date': '2020-07-29'
                },
                {
                    'id': 690,
                    'name': 'Dan',
                    'categories': ['Networking'],
                    'first_seen_date': '2020-08-31',
                    'last_seen_date': '2020-09-29'
                }
            ],
            'commentor_info': [{
                    'id': 34,
                    'name': 'Fischer',
                    'categories': ['Work'],
                    'first_seen_date': '2020-09-29',
                    'last_seen_date': '2020-09-29'
                },
                {
                    'id': 985,
                    'name': 'Candice',
                    'categories': ['Work'],
                    'first_seen_date': '2020-10-02',
                    'last_seen_date': '2020-10-02'
                }
            ]
        }
    ]
}   

I'd like to have the flattened out desired format (with only color and also all the attributes within contributor_info and commentor_info) ---

color  contri/comment         name    categories   first         last   
red       n/a                 n/a       n/a         n/a           n/a       
yellow  contributor_info    Anthony    Social     '2020-07-29'   '2020-07-29' 
yellow  contributor_info    Betsy      Networking '2020-08-31'   '2020-09-29' 
yellow  commentor_info      Future     Work       '2020-09-29'   '2020-09-29'
blue    contributor_info    Cassendra  Social     '2020-07-29'   '2020-07-29'
blue    contributor_info    Dan        Networking '2020-08-31'   '2020-09-29'
blue    commentor_info      Fischer    Work       '2020-09-29'   '2020-09-29'
blue    commentor_info      Candice    Work       '2020-10-02'   '2020-10-02'

After making it into a dataframe df =pd.DataFrame.from_dict(main['clothes']) I've tried .apply(pd.Series).stack().reset_index() and also explode function but it didn't turn out to flatten the nested attributes

Would really appreciate any helps/suggestions. Thanks in advance!

woblob
  • 1,349
  • 9
  • 13
martinv
  • 31
  • 3
  • I think its easier for you to flatten it yourself by a loop. It looks quite structured so shouldn't be hard. – Henry Yik Oct 12 '20 at 18:03
  • check answers for this [question](https://stackoverflow.com/questions/47242845/pandas-io-json-json-normalize-with-very-nested-json). future warning: if you want to use json_normalize, change `'` to `"` – woblob Oct 13 '20 at 07:10

0 Answers0