Redshift spectrum : how to import only certain files

Question

When using redshift spectrum, it seems you can only import data providing location until a folder, and it imports all the files inside the folder.

Is there a way to import import only one file from inside a folder with many files. When providing full path with filename , I think it treats the file as a manifest file and gives errors: manifest is too large or JSON not supported.

Is there any other way?

score 2 · Answer 1 · answered Aug 04 '19 at 00:09

You inadvertently answered your own question: Use a manifest file

From CREATE EXTERNAL TABLE - Amazon Redshift:

LOCATION { 's3://bucket/folder/' | 's3://bucket/manifest_file' }

The path to the Amazon S3 bucket or folder that contains the data files or a manifest file that contains a list of Amazon S3 object paths. The buckets must be in the same AWS Region as the Amazon Redshift cluster.

If the path specifies a manifest file, the s3://bucket/manifest_file argument must explicitly reference a single file—for example,'s3://mybucket/manifest.txt'. It can't reference a key prefix.

The manifest is a text file in JSON format that lists the URL of each file that is to be loaded from Amazon S3 and the size of the file, in bytes. The URL includes the bucket name and full object path for the file. The files that are specified in the manifest can be in different buckets, but all the buckets must be in the same AWS Region as the Amazon Redshift cluster.

I'm not sure why it requires the length of each file. It might be used to distribute the workload amongst multiple nodes.

Isn't there any way to import a single file without making any changes in the s3 bucket? (Without adding a manifest file). — Kushal Singh, Aug 04 '19 at 01:58

Redshift spectrum : how to import only certain files

1 Answers1