3

I have json file on S3, I want to transfer it to Redshift. One catch is that the file contains entries in such a format:

{
  "user_id":1,
  "metadata":
            {
              "connection_type":"WIFI",
              "device_id":"1234"
             }
 }

Before I will save it to Redshift I want to flatten the file to contain columns:

user_id | connection_type | device_id

How can I do this using AWS Data Pipeline? Is there activity that can transform json to the desired form? I do not think that transform sql will support json fields.

Kamil Grabowski
  • 135
  • 1
  • 3
  • 9

1 Answers1

3

You do not need to flatten it. You can load it with the copy command after defining a jsonpaths config file to easily extract the column values from each json object.

With your structure you'd create a file in S3 (s3://bucket/your_jsonpaths.json) like so:

{
    "jsonpaths": [
        "$.user_id",
        "$.metadata.connection_type",
        "$.metadata.device_id"
    ]
}

Then you'd run something like this in Redshift:

copy your_table
from 's3://bucket/data_objects.json'
credentials '<aws-auth-args>'
json 's3://bucket/your_jsonpaths.json';

If you have issues see what is in the stv_load_errors table.

Check out the Redshift copy command and examples.

systemjack
  • 2,815
  • 17
  • 26
  • Would it be possible to automate the COPY command for every file put in an S3 bucket? Can the COPY be fired through the SDK? If so, it should be possible to write a Lambda function that fires the command. – AitorF Jul 13 '16 at 13:31
  • 1
    You can enable SNS notifications on your S3 bucket and trigger the lambda from that. Check this out: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html and http://docs.aws.amazon.com/sns/latest/dg/sns-lambda.html – systemjack Jul 13 '16 at 17:10