How to retrieve original S3 data from AWS Athena

Question

I have an S3 bucket archiving JSON objects via Kinesis Firehose. Each bucket object can contain multiple JSON objects that can vary in the schema.

Bucket structure

bucket
└── archive
    └── 2021
        └── 04
            ├── 11
            |   ├── 10
            |   |   └── object-1
            └── 14
                └── 09
                    └── object-2

First object - s3://bucket/archive/2021/04/11/10/object-1

{"id":1,"timestamp":"2021-04-11T10:01:20Z","nestedData":{"moreNesting":{"someData":"value"}}}
{"id":2,"timestamp":"2021-04-11T12:00:31Z","someDifferentlyNestedData":{"differentNesting":{"someOtherData":"value"}}}

Second object - s3://bucket/archive/2021/04/14/09/object-2

{"id":3,"timestamp":"2021-04-14T09:02:30Z","noNestedData":"value"}

I have set up a Glue Crawler to create the Data Catalogue Table out of my data.

Data Catalogue Table

I am using AWS Athena to query that data. When querying, I can retrieve all the columns of the target JSON object with the following query:

All columns query

SELECT *
FROM archive
WHERE nesteddata.morenesting.somedata = 'value';

All columns query result

I can also retrieve the S3 object path of the JSON object with the following query:

S3 path query

SELECT "$path"
FROM archive
WHERE nesteddata.morenesting.somedata = 'value';

S3 path query results

Question

My question is, how can I retrieve the original JSON object via a query, i.e. how can I have a query result that returns something like the following:

@HarshadVyawahare unfortunately I haven't. I had an idea of base64 encoding the JSON and storing it in the JSON as a separate property, but that felt more like a hack than anything. — Aki K, Mar 04 '23 at 17:48

score 0 · Answer 1 · answered Apr 13 '21 at 08:34

0

If you don't mind having this representation on another table, you could simply create a table with a single string field via the Athena console using the Create table link.

answered Apr 13 '21 at 08:34

Nicolas Busca

1,100
7
14

Can you please explain with an example create table statement? – Harshad Vyawahare Feb 16 '23 at 12:24

How to retrieve original S3 data from AWS Athena

1 Answers1