2

I have follwoing setup.

  1. Application sends serialized JSON data into Firehose.
  2. Firehose is configured with Data conversion to praquet using a glue table definition for efficient query execution.
  3. I am able to run query in Athena and see the results.

Now what I need is to create another application which can query Athena using AWSSDK (C#) and read the data back in JSON format. Glue table

Is it possible to somehow use the table's input/output format and serde to read the data back in JSON format using Athena SDK? Or I need to implement custom logic to convert the data back to JSON?

Deepak Kumar
  • 426
  • 7
  • 20
  • Maybe you can do it using the following api https://docs.aws.amazon.com/athena/latest/APIReference/API_GetQueryResults.html – Hackerman Nov 16 '18 at 16:35
  • Yes, We are already using this API. The problem is that the query result rows are not in JSON format. I was thinking if the table or athena query result can be configured to return the data in JSON format. If not I'll need to write a parser to convert the query result to JSON. Which seems to be the only solution at hand right now. – Deepak Kumar Nov 18 '18 at 07:55
  • 1
    Possible duplicate of [AWS Athena export array of structs to JSON](https://stackoverflow.com/questions/49308410/aws-athena-export-array-of-structs-to-json) – Mark Rotteveel Nov 18 '18 at 10:28

4 Answers4

4

Question is old, but might help someone who trying to export results from Athena table into different output format. AWS CTAS can be used to export data into different formats (ORC, PARQUET, AVRO, JSON, or TEXTFILE) using simple Athena CTAS (https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html) statement. You can also specify compression format for saving output data.

Below example would export data into JSON format on s3.
output_location='s3://s3_bucket/output.json'

CREATE TABLE output
 WITH (
 format = 'JSON',
 external_location = output_location) AS SELECT * FROM target_table     

Downside of this approach : Output is always compressed.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
P. Phalak
  • 457
  • 1
  • 4
  • 11
2

Yes, you can.

First, create your Athena table with SerDe.

Then create a corresponding workgroup using Athena Engine Version 3. This is important since the cast function will return an array in version 2. If you use terraform, use these codes:

resource "aws_athena_workgroup" "app" {
  name  = "logs"
  state = "ENABLED"
  configuration {
    enforce_workgroup_configuration    = true
    publish_cloudwatch_metrics_enabled = true
    engine_version {
      selected_engine_version = "Athena engine version 3"
    }

    result_configuration {
      output_location = "s3://${local.log_bucket_name}/athena-query-record/"

      encryption_configuration {
        encryption_option = "SSE_S3"
      }
    }
  }
}

Then run a query like:

SELECT CAST(mail as JSON) AS mail_json FROM database LIMIT 10;

Remember to run that query in the workgroup you just created. You will get a valid JSON in the output, not an array.

Shawn Ye
  • 36
  • 2
1

Athena bases on Presto version 0.172, so you will find the answer in Presto documentation.For instance, this chapter can be useful to you:

https://prestodb.io/docs/current/functions/json.html

jbgorski
  • 1,824
  • 9
  • 16
-1

So I get something like this from Athena:

SELECT output FROM "test"."prod" limit 10;

{score=41, avg_time_solving={double=17439.333333333332, long=null}

If I use cast for this field I get:

SELECT CAST(output AS JSON) FROM "test"."prod" limit 10;

[51, [8205.57142857143,null], [null,5159], 7, [0.8571428571428571,null], null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]

Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
DavidBiller
  • 71
  • 1
  • 9