Unloading from redshift to s3 with headers

Question

I already know how to unload a file from redshift into s3 as one file. I need to know how to unload with the column headers. Can anyone please help or give me a clue?

I don't want to manually have to do it in shell or python.

score 36 · Answer 1 · answered Oct 02 '18 at 15:31

36

As of cluster version 1.0.3945, Redshift now supports unloading data to S3 with header rows in each file i.e.

UNLOAD('select column1, column2 from mytable;')
TO 's3://bucket/prefix/'
IAM_ROLE '<role arn>'
HEADER;

Note: you can't use the HEADER option in conjunction with FIXEDWIDTH.

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

answered Oct 02 '18 at 15:31

fez

1,726
3
21
31

8

Some great news. This should be the up-to-date accepted answer. – Pranasas Oct 04 '18 at 08:51

score 25 · Answer 2 · answered Oct 18 '14 at 18:54

If any of your columns are non-character, then you need to explicitly cast them as char or varchar because the UNION forces a cast.

Here is an example of the full statement that will create a file in S3 with the headers in the first row.

The output file will be a single CSV file with quotes.

This example assumes numeric values in column_1. You will need to adjust the ORDER BY clause to a numeric column to ensure the header row is in row 1 of the S3 file.

    ******************************************

    /* Redshift export to S3 CSV single file with headers - limit 6.2GB */

    UNLOAD ('

        SELECT \'column_1\',\'column_2\'

      UNION 

        SELECT 


          CAST(column_1 AS varchar(255)) AS column_1,
          CAST(column_2 AS varchar(255)) AS column_2


        FROM source_table_for_export_to_s3 


      ORDER BY 1 DESC

      ;



    ')

    TO 's3://bucket/path/file_name_for_table_export_in_s3_' credentials
     'aws_access_key_id=<key_with_no_<>_brackets>;aws_secret_access_key=<secret_access_key_with_no_<>_brackets>' 


    PARALLEL OFF 


    ESCAPE


    ADDQUOTES


    DELIMITER ','


    ALLOWOVERWRITE


    GZIP


    ;


    ****************************************

However this does not cover any data types that are of boolean value. I'm yet to come across a full solution for this. — pele88, May 09 '18 at 15:41
See the [this answer](https://stackoverflow.com/a/52611659/3622349) for an more up to date answer. — n8yoder, Apr 24 '20 at 00:08

score 12 · Accepted Answer · edited Jan 08 '16 at 03:56

There is no direct option provided by redshift unload .

But we can tweak queries to generate files with rows having headers added.

First we will try with parallel off option so that it will create only on file.

"By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files."

To have headers in unload files we will do as below.

Suppose you have table as below

create table mutable
(
    name varchar(64) default NULL,
    address varchar(512) default NULL
)

Then try to use select command from you unload as below to add headers as well

( select 'name','address') union ( select name,address from mytable )

this will add headers name and address as first line in your output.

If you're getting errors involving types, make sure that you read Douglas Hackney's answer below. — cheese1756, Jul 30 '15 at 18:31
Interesting though, Redshift does not Truncate the lower SQL Command to the lengths of upper command. +1 Learnt something new today. Cheers — Rhythem Aggarwal, Apr 12 '17 at 09:46
See answer below that this is now supported directly in Redshift using HEADER! — emilyk, Jun 12 '19 at 00:43

Kamran · Answer 4 · 2018-02-06T14:58:59.430

12

Just to complement the answer, to ensure the header row comes first, you don't have to order by a specific column of data. You can enclose the UNIONed selects inside another select, add a ordinal column to them and then in the outer select order by that column without including it in the list of selected columns.

UNLOAD ('

  SELECT column_1, column_2 FROM (

     SELECT 1 AS i,\'column_1\' AS column_, \'column_2\' AS column_2
     UNION ALL
     SELECT 2 AS i, column_1::varchar(255), column_2::varchar(255)
     FROM source_table_for_export_to_s3

  ) t ORDER BY i

')
TO 's3://bucket/path/file_name_for_table_export_in_s3_'

CREDENTIALS
 'aws_access_key_id=...;aws_secret_access_key=...' 

DELIMITER ','
PARALLEL OFF 
ESCAPE
ADDQUOTES;

edited Feb 06 '18 at 14:58

answered Jan 09 '15 at 15:15

Kamran

247
4
6

I wrote a recipe with a function that generate the unload statement, given the table name and fields, see http://g14n.info/SQL92-JSON/recipes/spool-header/ – Gianluca Casati Apr 22 '17 at 19:00
This is the best way to make sure the header row stays on top. IMO, it's the only reliable way. – Marco Roy Dec 28 '17 at 21:29
@Kamran Also, `CAST(column_1 AS varchar(255))` could be shortened to `column_1::varchar(255)` (see https://docs.aws.amazon.com/redshift/latest/dg/r_CAST_function.html) – Marco Roy Jan 31 '18 at 00:41

score 8 · Answer 5 · answered Dec 19 '18 at 07:22

8

Redshift now supports unload with headers. September 19–October 10, 2018 release.

The syntax for unloading with headers is -

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
HEADER

answered Dec 19 '18 at 07:22

santhosh

81
1
1

Marco Roy · Answer 6 · 2018-11-20T01:58:16.697

2

Unfortunately, the UNLOAD command doesn't natively support this feature (see other answers for how to do it with workarounds).

I've posted a feature request on the AWS forums, so hopefully it gets added someday.

Edit: The feature has now been implemented natively in Redshift!

edited Nov 20 '18 at 01:58

answered Dec 28 '17 at 21:25

Marco Roy

4,004
7
34
50

Good news: the feature has now been added! See https://forums.aws.amazon.com/ann.jspa?annID=6136 – Marco Roy Nov 20 '18 at 01:56

score 2 · Answer 7 · edited Feb 08 '19 at 16:25

Try like this:

Unload VENUE with a Header:

unload ('select * from venue where venueseats > 75000')
to 's3://mybucket/unload/' 
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
header
parallel off;

The following shows the contents of the output file with a header row:

venueid|venuename|venuecity|venuestate|venueseats
6|New York Giants Stadium|East Rutherford|NJ|80242
78|INVESCO Field|Denver|CO|76125
83|FedExField|Landover|MD|91704
79|Arrowhead Stadium|Kansas City|MO|79451

score 1 · Answer 8 · answered Aug 01 '17 at 16:39

To make the process easier you can use a pre-built docker image to extract and include the header row.

https://github.com/openbridge/ob_redshift_unload

It will also do a few other things, but it seemed to make sense to package this in an easy to use format.

score 0 · Answer 9 · answered Aug 22 '19 at 15:00

To unload a table as csv to s3 including the headers, you will simply have to do it this way

UNLOAD ('SELECT * FROM {schema}.{table}')
        TO 's3://{s3_bucket}/{s3_key}/{table}/'
        with credentials
        'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
        CSV HEADER ALLOWOVERWRITE PARALLEL OFF;

Unloading from redshift to s3 with headers

9 Answers9