Parsing out JSON in Pyspark with unique keys

Asked Mar 12 '20 at 21:08

Active Mar 12 '20 at 21:08

Viewed 156 times

I'm struggling at the moment to work out how to parse out some JSON where the keys are unique so i'm not sure how to do this with the normal pyspark sql functions. Is it possible to do the below without a UDF?

{
  "key 1": "value 1",
  "key 2": {
    "key 3": {
      "key 4": {
        "key 5": "value 2",
        "key 6": "value 3"
      },
      "key 7": {
        "key 5": "value 4",
        "key 6": "value 5"
      },
      "key 8": {
        "key 5": "value 6",
        "key 6": "value 7"
      }
    }
  }
}

My goal is to basically turn it into the following table:

| A       | B     | C     | D     | E       | F       |
|---------|-------|-------|-------|---------|---------|
| value 1 | key 2 | key 3 | key 4 | value 2 | value 3 |
| value 1 | key 2 | key 3 | key 7 | value 4 | value 5 |
| value 1 | key 2 | key 3 | key 8 | value 6 | value 7 |

If key 4,7 and 8 were the same i'd be able to parse out the below easily however i can't find any context on doing this where the keys for 4,7,8 and so on are unknown and therefore can't be defined in the get_json_object function.

Thanks

asked Mar 12 '20 at 21:08

ImNewToThis

1

Is the JSON standalone or a column in a DataFrame? – CPak Mar 13 '20 at 13:39
1

can you list all keys which are known and constant? – jxc Mar 13 '20 at 13:49
Key 1, 2, 3,5,6 are constant and yes its a string column in a dataframe. – ImNewToThis Mar 13 '20 at 14:39
use from_json and map, check this similar question from yesterday: https://stackoverflow.com/questions/60658754 – jxc Mar 13 '20 at 16:23

Parsing out JSON in Pyspark with unique keys

0 Answers0