1

I have an S3 bucket archiving JSON objects via Kinesis Firehose. Each bucket object can contain multiple JSON objects that can vary in the schema.

Bucket structure

bucket
└── archive
    └── 2021
        └── 04
            ├── 11
            |   ├── 10
            |   |   └── object-1
            └── 14
                └── 09
                    └── object-2

First object - s3://bucket/archive/2021/04/11/10/object-1

{"id":1,"timestamp":"2021-04-11T10:01:20Z","nestedData":{"moreNesting":{"someData":"value"}}}
{"id":2,"timestamp":"2021-04-11T12:00:31Z","someDifferentlyNestedData":{"differentNesting":{"someOtherData":"value"}}}

Second object - s3://bucket/archive/2021/04/14/09/object-2

{"id":3,"timestamp":"2021-04-14T09:02:30Z","noNestedData":"value"}

I have set up a Glue Crawler to create the Data Catalogue Table out of my data.

Data Catalogue Table

Data Catalogue Table

I am using AWS Athena to query that data. When querying, I can retrieve all the columns of the target JSON object with the following query:

All columns query

SELECT *
FROM archive
WHERE nesteddata.morenesting.somedata = 'value';

All columns query result

Query results

I can also retrieve the S3 object path of the JSON object with the following query:

S3 path query

SELECT "$path"
FROM archive
WHERE nesteddata.morenesting.somedata = 'value';

S3 path query results

Query results

Question

My question is, how can I retrieve the original JSON object via a query, i.e. how can I have a query result that returns something like the following:

Expected query results

Aki K
  • 1,222
  • 1
  • 27
  • 49
  • @HarshadVyawahare unfortunately I haven't. I had an idea of base64 encoding the JSON and storing it in the JSON as a separate property, but that felt more like a hack than anything. – Aki K Mar 04 '23 at 17:48

1 Answers1

0

If you don't mind having this representation on another table, you could simply create a table with a single string field via the Athena console using the Create table link.

Nicolas Busca
  • 1,100
  • 7
  • 14