29

I already know how to unload a file from redshift into s3 as one file. I need to know how to unload with the column headers. Can anyone please help or give me a clue?

I don't want to manually have to do it in shell or python.

Tokunbo Hiamang
  • 429
  • 2
  • 6
  • 11

9 Answers9

36

As of cluster version 1.0.3945, Redshift now supports unloading data to S3 with header rows in each file i.e.

UNLOAD('select column1, column2 from mytable;')
TO 's3://bucket/prefix/'
IAM_ROLE '<role arn>'
HEADER;

Note: you can't use the HEADER option in conjunction with FIXEDWIDTH.

https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

fez
  • 1,726
  • 3
  • 21
  • 31
25

If any of your columns are non-character, then you need to explicitly cast them as char or varchar because the UNION forces a cast.

Here is an example of the full statement that will create a file in S3 with the headers in the first row.

The output file will be a single CSV file with quotes.

This example assumes numeric values in column_1. You will need to adjust the ORDER BY clause to a numeric column to ensure the header row is in row 1 of the S3 file.

    ******************************************

    /* Redshift export to S3 CSV single file with headers - limit 6.2GB */

    UNLOAD ('

        SELECT \'column_1\',\'column_2\'

      UNION 

        SELECT 


          CAST(column_1 AS varchar(255)) AS column_1,
          CAST(column_2 AS varchar(255)) AS column_2


        FROM source_table_for_export_to_s3 


      ORDER BY 1 DESC

      ;



    ')

    TO 's3://bucket/path/file_name_for_table_export_in_s3_' credentials
     'aws_access_key_id=<key_with_no_<>_brackets>;aws_secret_access_key=<secret_access_key_with_no_<>_brackets>' 


    PARALLEL OFF 


    ESCAPE


    ADDQUOTES


    DELIMITER ','


    ALLOWOVERWRITE


    GZIP


    ;


    ****************************************
Douglas Hackney
  • 291
  • 1
  • 3
  • 8
  • However this does not cover any data types that are of boolean value. I'm yet to come across a full solution for this. – pele88 May 09 '18 at 15:41
  • See the [this answer](https://stackoverflow.com/a/52611659/3622349) for an more up to date answer. – n8yoder Apr 24 '20 at 00:08
12

There is no direct option provided by redshift unload .

But we can tweak queries to generate files with rows having headers added.

First we will try with parallel off option so that it will create only on file.

"By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files."

To have headers in unload files we will do as below.

Suppose you have table as below

create table mutable
(
    name varchar(64) default NULL,
    address varchar(512) default NULL
)

Then try to use select command from you unload as below to add headers as well

( select 'name','address') union ( select name,address from mytable )

this will add headers name and address as first line in your output.

Jon Saw
  • 7,599
  • 6
  • 48
  • 57
Sandesh Deshmane
  • 2,247
  • 1
  • 22
  • 25
12

Just to complement the answer, to ensure the header row comes first, you don't have to order by a specific column of data. You can enclose the UNIONed selects inside another select, add a ordinal column to them and then in the outer select order by that column without including it in the list of selected columns.

UNLOAD ('

  SELECT column_1, column_2 FROM (

     SELECT 1 AS i,\'column_1\' AS column_, \'column_2\' AS column_2
     UNION ALL
     SELECT 2 AS i, column_1::varchar(255), column_2::varchar(255)
     FROM source_table_for_export_to_s3

  ) t ORDER BY i

')
TO 's3://bucket/path/file_name_for_table_export_in_s3_'

CREDENTIALS
 'aws_access_key_id=...;aws_secret_access_key=...' 

DELIMITER ','
PARALLEL OFF 
ESCAPE
ADDQUOTES;
Kamran
  • 247
  • 4
  • 6
  • I wrote a recipe with a function that generate the unload statement, given the table name and fields, see http://g14n.info/SQL92-JSON/recipes/spool-header/ – Gianluca Casati Apr 22 '17 at 19:00
  • This is the best way to make sure the header row stays on top. IMO, it's the only reliable way. – Marco Roy Dec 28 '17 at 21:29
  • @Kamran Also, `CAST(column_1 AS varchar(255))` could be shortened to `column_1::varchar(255)` (see https://docs.aws.amazon.com/redshift/latest/dg/r_CAST_function.html) – Marco Roy Jan 31 '18 at 00:41
8

Redshift now supports unload with headers. September 19–October 10, 2018 release.

The syntax for unloading with headers is -

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
HEADER

santhosh
  • 81
  • 1
  • 1
2

Unfortunately, the UNLOAD command doesn't natively support this feature (see other answers for how to do it with workarounds).

I've posted a feature request on the AWS forums, so hopefully it gets added someday.

Edit: The feature has now been implemented natively in Redshift!

Marco Roy
  • 4,004
  • 7
  • 34
  • 50
2

Try like this:

Unload VENUE with a Header:

unload ('select * from venue where venueseats > 75000')
to 's3://mybucket/unload/' 
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
header
parallel off;

The following shows the contents of the output file with a header row:

venueid|venuename|venuecity|venuestate|venueseats
6|New York Giants Stadium|East Rutherford|NJ|80242
78|INVESCO Field|Denver|CO|76125
83|FedExField|Landover|MD|91704
79|Arrowhead Stadium|Kansas City|MO|79451

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
karthik G
  • 21
  • 1
1

To make the process easier you can use a pre-built docker image to extract and include the header row.

https://github.com/openbridge/ob_redshift_unload

It will also do a few other things, but it seemed to make sense to package this in an easy to use format.

0

To unload a table as csv to s3 including the headers, you will simply have to do it this way

UNLOAD ('SELECT * FROM {schema}.{table}')
        TO 's3://{s3_bucket}/{s3_key}/{table}/'
        with credentials
        'aws_access_key_id={access_key};aws_secret_access_key={secret_key}'
        CSV HEADER ALLOWOVERWRITE PARALLEL OFF;
sambeth
  • 1,550
  • 2
  • 10
  • 18