0

I currently have data arriving from Firehose into an Athena table. When I view the data it is an array of JSON. Is it possible to use a glue job to split the arrays into separate rows so each row is its own JSON log.

For example: Data arriving [{"a":"test1", "b":"success"},{"a":"test2", "b":"success"}]

What the glue job should change it to: {"a":"test1", "b":"success"} {"a":"test2", "b":"success"}

2 Answers2

0

This can be done very easily with the explode function of pyspark.

You just need to convert your DynamicFrame to a DataFrame, by calling the .toDF() function on it.

Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
0

You should try the relationalize method of Glue, it does wonder for nested structures. You can go through the examples here Relationalize