0

I am trying to normalize this nested JSON file but the 'rewardsReceiptItemList' will not flatten no matter what I do. I tried using json.normalize and record_path= argument.

Here is an example of the JSON:

{"result":[{"_id": {"$oid": "5ff1e1eb0a720f0523000575"}, "bonusPointsEarned": 500, "bonusPointsEarnedReason": "Receipt number 2 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)", "createDate": {"$date": 1609687531000}, "dateScanned": {"$date": 1609687531000}, "finishedDate": {"$date": 1609687531000}, "modifyDate": {"$date": 1609687536000}, "pointsAwardedDate": {"$date": 1609687531000}, "pointsEarned": "500.0", "purchaseDate": {"$date": 1609632000000}, "purchasedItemCount": 5, "rewardsReceiptItemList": [{"barcode": "4011", "description": "ITEM NOT FOUND", "finalPrice": "26.00", "itemPrice": "26.00", "needsFetchReview": false, "partnerItemId": "1", "preventTargetGapPoints": true, "quantityPurchased": 5, "userFlaggedBarcode": "4011", "userFlaggedNewItem": true, "userFlaggedPrice": "26.00", "userFlaggedQuantity": 5}], "rewardsReceiptStatus": "FINISHED", "totalSpent": "26.00", "userId": "5ff1e1eacfcf6c399c274ae6"}, {"_id": {"$oid": "5ff1e1bb0a720f052300056b"}, "bonusPointsEarned": 150, "bonusPointsEarnedReason": "Receipt number 5 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)", "createDate": {"$date": 1609687483000}, "dateScanned": {"$date": 1609687483000}, "finishedDate": {"$date": 1609687483000}, "modifyDate": {"$date": 1609687488000}, "pointsAwardedDate": {"$date": 1609687483000}, "pointsEarned": "150.0", "purchaseDate": {"$date": 1609601083000}, "purchasedItemCount": 2, "rewardsReceiptItemList": [{"barcode": "4011", "description": "ITEM NOT FOUND", "finalPrice": "1", "itemPrice": "1", "partnerItemId": "1", "quantityPurchased": 1}, {"barcode": "028400642255", "description": "DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ", "finalPrice": "10.00", "itemPrice": "10.00", "needsFetchReview": true, "needsFetchReviewReason": "USER_FLAGGED", "partnerItemId": "2", "pointsNotAwardedReason": "Action not allowed for user and CPG", "pointsPayerId": "5332f5fbe4b03c9a25efd0ba", "preventTargetGapPoints": true, "quantityPurchased": 1, "rewardsGroup": "DORITOS SPICY SWEET CHILI SINGLE SERVE", "rewardsProductPartnerId": "5332f5fbe4b03c9a25efd0ba", "userFlaggedBarcode": "028400642255", "userFlaggedDescription": "DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ", "userFlaggedNewItem": true, "userFlaggedPrice": "10.00", "userFlaggedQuantity": 1}], "rewardsReceiptStatus": "FINISHED", "totalSpent": "11.00", "userId": "5ff1e194b6a9d73a3a9f1052"},{"_id": {"$oid": "5ff475820a7214ada10005cf"}, "createDate": {"$date": 1609856386000}, "dateScanned": {"$date": 1609856386000}, "modifyDate": {"$date": 1609856386000}, "rewardsReceiptStatus": "SUBMITTED", "userId": "5a43c08fe4b014fd6b6a0612"}]}

After applying json normalize, it looks like this (one column that is still nested):

There is one column that is still nested

I tried using record path, but it shows a KeyError:

enter image description here

Nico
  • 79
  • 1
  • 1
  • 9
  • 1
    [Please do not upload images of code/data/errors when asking a question.](http://meta.stackoverflow.com/q/285551) – martineau May 11 '22 at 18:58
  • What would a "flattened" list of dictionaries look like? – martineau May 11 '22 at 19:01
  • Ideally I would just like to extract all the fields within the "rewardsReceiptItemList" into a dataframe. Like how json_normalize would work with record_path and meta. – Nico May 11 '22 at 19:11

1 Answers1

0
import json
import pandas as pd

file = """<your_json_string>"""

data = json.loads(file)

# Remove the records that don't have the required Key.
data['result'] = [x for x in data['result'] if x.get('rewardsReceiptItemList')]

df = pd.json_normalize(data, ['result', 'rewardsReceiptItemList'])
print(df)

Output:

        barcode                                        description finalPrice itemPrice needsFetchReview partnerItemId  ... needsFetchReviewReason               pointsNotAwardedReason             pointsPayerId                            rewardsGroup   rewardsProductPartnerId                             userFlaggedDescription
0          4011                                     ITEM NOT FOUND      26.00     26.00            False             1  ...                    NaN                                  NaN                       NaN                                     NaN                       NaN                                                NaN
1          4011                                     ITEM NOT FOUND          1         1              NaN             1  ...                    NaN                                  NaN                       NaN                                     NaN                       NaN                                                NaN
2  028400642255  DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCE...      10.00     10.00             True             2  ...           USER_FLAGGED  Action not allowed for user and CPG  5332f5fbe4b03c9a25efd0ba  DORITOS SPICY SWEET CHILI SINGLE SERVE  5332f5fbe4b03c9a25efd0ba  DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCE...
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • I tried this with a few lines of my JSON data and it works, but when I run it with the entire file, it throws the key error once more. Do you have any idea what causes this? Empty arrays? – Nico May 11 '22 at 19:50
  • You can always try it with the `errors='ignore'` parameter~ But yes, happens if keys listed are not always present. – BeRT2me May 11 '22 at 20:22
  • I realized the issue and `errors='ignore'` only works with `meta` not for `record_path`. – Nico May 11 '22 at 20:50
  • I found out that there are a few lines in the json that do not have "rewardsReceiptItemList". I edited the Json file above to account for that. Any ideas on how to go about this? – Nico May 11 '22 at 20:52
  • @Nico Edited my answer to address records missing `rewardsReceiptItemList` – BeRT2me May 11 '22 at 21:15