I've been working on some data, trying to clean some data and I've found some difficulties that I hope this community can help me with.
I have this json structure
{ "date": "04/23/2023"
,"eventName": "click"
,"UserProperties": [{ "key": "Name"
,"string_value": "Lynn Smith"
,"int_value": null
},
{ "key": "Age"
,"string_value": null
,"int_value": 13
},
{ "key": "Gender"
,"string_value": "Female"
,"int_value": null
}
]
,"EventParams": [{ "key": "Website"
,"string_value": "xxx.com"
,"int_value": null
},
{ "key": "pageTitle"
,"string_value": "zzzz"
,"int_value": null
}
]
,"Geography": {"country": "xxx", "city": "yyy"}
}
This is how I want my table to look
date | eventName | country | city | Name | Age | Gender | Website | pageTitle |
---|---|---|---|---|---|---|---|---|
04/23/2023 | click | xxx | yyy | Lynn Smith | 13 | Female | xxx.com | zzzz |
I manage to get this data into a dataframe with this structure
date | eventName | country | city | UserProperties | EventParams |
---|---|---|---|---|---|
04/23/2023 | click | xxx | yyy | [{ "key": "Name" | [{ "key": "Website" |
,"string_value": "Lynn Smith" | ,"string_value": "xxx.com" | ||||
,"int_value": null | ,"int_value": null | ||||
}, | }, | ||||
{ "key": "Age" | { "key": "pageTitle" | ||||
,"string_value": null | ,"string_value": "zzzz" | ||||
,"int_value": 13 | ,"int_value": null | ||||
}, | } | ||||
{ "key": "Gender" | ] | ||||
,"string_value": "Female" | |||||
,"int_value": null | |||||
} | |||||
] |
I also was able to use the explode function, but I had to use it twice (one for the UserProperties and other for the EventParams). This was the result, but dont know how to turn on the format that I want.
date | eventName | country | city | upKey | upValue | epKey | epValue |
---|---|---|---|---|---|---|---|
04/23/2023 | click | xxx | yyy | Name | Lynn Smith | Website | xxx.com |
04/23/2023 | click | xxx | yyy | Name | Lynn Smith | pageTitle | zzzz |
04/23/2023 | click | xxx | yyy | Age | 13 | Website | xxx.com |
04/23/2023 | click | xxx | yyy | Age | 13 | pageTitle | zzzz |
04/23/2023 | click | xxx | yyy | Gender | Female | Website | xxx.com |
04/23/2023 | click | xxx | yyy | Gender | Female | pageTitle | zzzz |
The main reason I want this kind of result is because I want to count the number of events by age for example. For this specific example the count is 1.
Also, I can have multiple row per date, eventName, country, city the difference beween events can be in any of the keys within the UserProperties and EventParams.
Thank you very much in advance.