12

Is there a way to change the column ordering in Amazon Redshift (or Postgres, since it is based on it)? Or maybe add a column at a specific position?

In mysql there you can do:

ALTER TABLE MY_TABLE
ADD COLUMN {NEW_COL} AFTER {EXISTING_COL}

But this does not work in Redshift. Any ideas?

Martin Taleski
  • 6,033
  • 10
  • 40
  • 78
  • Purists would argue that you should never need to, because columns should be referred to by name, not order, but in practice SQL *does* treat columns as ordered, so it sort of makes sense to want to choose that order. [The ALTER TABLE docs](http://www.postgresql.org/docs/9.4/interactive/sql-altertable.html) don't mention any way to do it, but that doesn't prove there isn't a workaround of some sort, particularly in Redshift, which uses a "column-oriented" storage model. – IMSoP Apr 23 '15 at 10:26
  • 4
    when importing data with a COPY command you need the column order in the table to match the column order of the CSV file. – Martin Taleski Apr 23 '15 at 10:28
  • Probably a better link would be to [the Redshift docs](http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html), since it diverged from Postgres some time ago. However, now you mention COPY, I think we may have an [X/Y Problem](http://meta.stackoverflow.com/questions/66377/what-is-the-xy-problem) here... – IMSoP Apr 23 '15 at 10:32
  • Can I ask - why do you need to change the order of the columns? The way I look at it, redshift column order should not matter. – Jon Scott Dec 07 '16 at 23:12
  • Because what Martin said. When you copy to upload a table from s3 e.g. you need columns of the csv/parquet to be in the same order as the redshift table, otherwise it won't work – Eduardo EPF Oct 27 '21 at 10:57

5 Answers5

11

From your comments, it seems that what you actually need is to be able to COPY into a table from a file which has columns in a particular order.

According to the Redshift documentation for the COPY command:

(column1 [, column2, ...])

Specifies an optional column list to load data fields into specific columns. The columns can be in any order in the COPY statement, but when loading from flat files, such as in an Amazon S3 bucket, their order must match the order of the source data. [...] If no column list is specified, the command behaves as if a complete, in-order column list was specified.

So rather than re-ordering the columns in your table, you just need to specify them in the COPY statement, as in some of the examples in the docs:

copy venue_new(venueid, venuename, venuecity, venuestate) 
from 's3://mybucket/data/venue_noseats.txt' 
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'
delimiter '|';
IMSoP
  • 89,526
  • 13
  • 117
  • 169
  • 2
    It would be much more difficult for me to change the copy commands, than to change the column ordering. There seems to be a way to do it in postgres, I am checking if it can work in Redshift: http://stackoverflow.com/questions/285733/how-do-i-alter-the-position-of-a-column-in-a-postgresql-database but, thanks for the effort – Martin Taleski Apr 23 '15 at 10:38
  • @MartinTaleski OK, thought it was worth the suggestion. Note that a view won't help for writing to the table, and the other options require manually deleting and reinserting existing data. – IMSoP Apr 23 '15 at 10:40
  • It does not matter what he needed, the title is "How to change column ordering in Amazon Redshift". I actually need to do this for reasons beyond the scope of this question. "Do something completely different instead" should be ADDED to the actual answer. – Mark Gerolimatos Dec 07 '16 at 19:36
6

The answer is no, redshift does not (easily) support column shuffling, which is strange, as I believe tables are stored as individual columns. There is no way to do this without unloading/loading or table copying.

It is said that the unload/load is the preferred method, as it will take advantage of any parallelism you configured into your table.

Thus, the standard methodology is necessary:

There may be a "secret way" to do this with only the one column in question (dump column, drop column, add column, reload column), but that sounds incredibly sketchy and should be avoided.

Mark Gerolimatos
  • 2,424
  • 1
  • 23
  • 33
2

Given old_table (oldcolumn1 int, oldcolumn2 int), you can do the following:

create table new_table (
  newcolumn0 int,
  oldcolumn1 int,
  oldcolumn2 int
);
alter table new_table append from old_table; --EMPTIES old_table and COMMITS!
drop table old_table;
alter table new_table rename to old_table;

This approach is not for the faint of heart, read the docs on ALTER TABLE APPEND. https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE_APPEND.html

If that's OK, the pros are: ..doesn't care about column order for columns with the same name/type ..doesn't require copying out ..should be quicker than insert..select for large tables

B M
  • 21
  • 1
1

Redshift doesn't support ordering at all. I have to solve same problem in my case, and this is how I have done it.

Best option is following unload,alter the table by drop and re-create.

1)Unload to S3

unload ('select (Column1,column2,Column3,...,Column_n) from orginal_table') to 's3://<your_bucket>/<your_file>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' MANIFEST  DELIMITER '|'  GZIP   NULL AS 'null_string' ESCAPE ALLOWOVERWRITE;

2)Drop AND/Or re-create

Create duplicate_table(Column1,column2,Column3,...,Column_n);**with new sequence make sure your seq

3)Reload.

copy duplicate_table(Column1,column2,Column3,...,Column_n) from  's3://<your_bucket>/<your_file>manifest' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' MANIFEST  DELIMITER '|'  GZIP   NULL AS 'null_string' ESCAPE ALLOWOVERWRITE;
Red Boy
  • 5,429
  • 3
  • 28
  • 41
  • Redshift does support `alter table`. See [their docs](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html) for details. It doesn't seem like they support re-ordering columns through it though. – N00b Feb 03 '18 at 02:01
  • Yes, agreed! Red-shift does support alter, but here I meant alter for ordering only. I have corrected my statement. – Red Boy Feb 25 '18 at 10:11
1

You can simply create a new table in Redshift with the required ordering

CREATE TABLE temp_table_name (column1 dtype1, column2 dtype2, column3 dtype 3 ...);

and insert data from the source table in the required order.

INSERT INTO temp_table_name (SELECT column1, column2, column3 ... FROM table_name);

Then drop the original table

DROP TABLE table_name;

and rename the temp table to the original table

ALTER TABLE temp_table_name RENAME TO table_name;
Shubham Gupta
  • 414
  • 7
  • 19