1

I would like to know if there is a way to flatten deeply nested JSON using Glue ETL job? This has nested arrays in it. I tried to run a Glue crawler on the JSON which returned a catalog with just 1 field PerPlayer with a struct data type. In the glue ETL job should I be using a catalog or just read the JSON into dynamicframe and perform some transform to flatten it? I was able to flatten using relationalize if there is only 1 record (without array), but my input has got several records in an array structure and some of the records have some nested arrays within it.

I am entirely new to Glue ETL, so any advice or suggestion would be greatly appreciated.

{
    "PerPlayer": {
        "requestNo": "REQ912",
        "Batch_Number": "1",
        "Total_No_Of_Batches": "1",
        "player": [
            {
                "username": "user1",
                "characteristics": {
                    "race": "Human",
                    "class": "Warlock",
                    "subclass": "Dawnblade",
                    "power": 300,
                    "playercountry": "USA"
                },
                "arsenal": [
                    {
                        "kinetic": {
                            "name": "Sweet Business",
                            "type": "Auto Rifle",
                            "power": 300,
                            "element": "Kinetic"
                        },
                        "energy": {
                            "name": "MIDA Mini-Tool",
                            "type": "Submachine Gun",
                            "power": 300,
                            "element": "Solar"
                        },
                        "power": {
                            "name": "Play of the Game",
                            "type": "Grenade Launcher",
                            "power": 300,
                            "element": "Arc"
                        }
                    },
                    {
                        "kinetic": {
                            "name": "Sweet Business1",
                            "type": "Auto Rifle1",
                            "power": 300,
                            "element": "Kinetic1"
                        },
                        "energy": {
                            "name": "MIDA Mini-Tool",
                            "type": "Submachine Gun",
                            "power": 300,
                            "element": "Solar1"
                        },
                        "power": {
                            "name": "Play of the Game1",
                            "type": "Grenade Launcher1",
                            "power": 300,
                            "element": "Arc1"
                        }
                    }
                ],
                "armor": {
                    "head": "Eye of Another World",
                    "arms": "Philomath Gloves",
                    "chest": "Philomath Robes",
                    "leg": "Philomath Boots",
                    "classitem": "Philomath Bond"
                },
                "location": {
                    "map": "Titan",
                    "waypoint": "The Rig"
                }
            },
            {
                "username": "user2",
                "characteristics": {
                    "race": "Alien",
                    "class": "Starwars",
                    "subclass": "Dawnblade",
                    "power": 300,
                    "playercountry": "USA"
                },
                "arsenal": {
                    "kinetic": {
                        "name": "salt Business",
                        "type": "Auto Rifle",
                        "power": 300,
                        "element": "Kinetic"
                    },
                    "energy": {
                        "name": "MIDA Mini-Tool",
                        "type": "Submachine Gun",
                        "power": 300,
                        "element": "Solar"
                    },
                    "power": {
                        "name": "Play of the Game",
                        "type": "Grenade Launcher",
                        "power": 400,
                        "element": "Arc"
                    }
                },
                "armor": {
                    "head": "Eye of Another World",
                    "arms": "Philomath Gloves",
                    "chest": "Philomath Robes",
                    "leg": "Philomath Boots",
                    "classitem": "Philomath Bond"
                },
                "location": {
                    "map": "Titan",
                    "waypoint": "The Rig"
                }
            }
        ]
    }
}
srmk
  • 85
  • 1
  • 8

1 Answers1

1

Unfortunately this is not possible with Glue Crawlers, that service will only create tables that look like the data, not change the data – and there is not Athena feature that maps a nested hierarchy to a flat structure at the serde level either.

You might be able to do it with Glue ETL by transforming the data into a new flattened data set, but in general it seems to me that people who try Glue end up with more problems than they get solved.

What you could do is to live with the table that the crawler creates for you and create a view in Athena that flattens the hierarchy. There's an operator called UNNEST that hoists array elements as rows. It could look something like this:

SELECT
  PerPlayer.requestNo,
  PerPlayer.Batch_Number,
  PerPlayer.Total_No_Of_Batches,
  player.username,
  player.characteristics.race,
  player.characteristics.class,
  -- and so on
FROM original_table, UNNEST (PerPlayer.player) AS t(player)

What happens is that the result will have one row per element in each original row's player array, and you can access columns from both the original row as well as the player element. The AS t(player) syntax just means that the virtual table containing the array elements should be called t and have a column called player.

There's lots of other questions about UNNEST here on Stack Overflow that you could look at for inspiration too.

If you want to run queries agains the flat structure you can create a view from the query above and run the queries against that view. Performance aside it would be as if your data had been flattened.

Performance will depend on a lot of details, and don't optimise unless necessary. You can use the query above to create a new flat data set using CTAS.

Theo
  • 131,503
  • 21
  • 160
  • 205
  • Don't suppose you have an example of how to do this with a glue ETL? https://aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/ Looking at how to handle nested arrays. It seems people have had success with `explode()` but yet to find some sample code. – Sigex Jan 19 '22 at 13:38