21

I am trying to create a table with an auto-increment column as below. Since Redshift psql doesn't support SERIAL, I had to use IDENTITY data type:

IDENTITY(seed, step)
Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. These values start with the value specified as seed and increment by the number specified as step. The data type for an IDENTITY column must be either INT or BIGINT.`

My create table statement looks like this:

CREATE TABLE my_table(
        id INT IDENTITY(1,1),
        name CHARACTER VARYING(255) NOT NULL,
        PRIMARY KEY( id ) 
);

However, when I tried to insert data into my_table, rows increment only on the even number, like below:

 id | name | 
----+------+
  2 | anna |
  4 | tom  |
  6 | adam |
  8 | bob  |
 10 | rob  |

My insert statements look like below:

INSERT INTO my_table ( name ) 
VALUES ( 'anna' ), ('tom') , ('adam') , ('bob') , ('rob' );

I am also having trouble with bringing the id column back to start with 1. There are solutions for SERIAL data type, but I haven't seen any documentation for IDENTITY. Any suggestions would be much appreciated!

peipei
  • 1,407
  • 3
  • 14
  • 22
  • Weird. I tried your code and don't see the problem. The ids are 1-5. – Anthony Mar 26 '15 at 06:44
  • I have the same problem while loading a csv file from s3 to redshift. – Alex Mathew Oct 28 '15 at 05:28
  • 3
    For the record, I have the same problem. Strange how it is not consistent. (I did IDENTITY(0,1) as the Jorge said below, but the second one being 1 should be all that matters. I wonder if its a regional bug? I'm on us-west-2 (Oregon) – Cognitio Jun 25 '16 at 01:21
  • I have the same issue too: `CREATE TABLE temp ( id bigint IDENTITY(0,1), val varchar(255) ); INSERT INTO temp (val) values ('a'), ('b'), ('c');` gives me ascending even values, starting with 0. – Ogaday Jun 20 '17 at 14:11

3 Answers3

18

You have to set your identity as follows:

id INT IDENTITY(0,1)

Source: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_examples.html

And you can't reset the id to 0. You will have to drop the table and create it back again.

Jorge
  • 664
  • 1
  • 12
  • 33
6

Set your seed value to 1 and your step value to 1.

Create table

CREATE table my_table(
    id bigint identity(1, 1),
    name varchar(100),
    primary key(id));

Insert rows

INSERT INTO organization ( name ) 
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');

Results

 id | name | 
----+------+
  1 | anna |
  2 | tom  |
  3 | adam |
  4 | bob  |
  5 | rob  |

For some reason, if you set your seed value to 0 and your step value to 1 then the integer will increase in steps of 2.

Create table

CREATE table my_table(
    id bigint identity(0, 1),
    name varchar(100),
    primary key(id));

Insert rows

INSERT INTO organization ( name ) 
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');

Results

 id | name | 
----+------+
  0 | anna |
  2 | tom  |
  4 | adam |
  6 | bob  |
  8 | rob  |
Andrew Fogg
  • 645
  • 1
  • 8
  • 16
6

This issue is discussed at length in AWS forum.

https://forums.aws.amazon.com/message.jspa?messageID=623201

The answer from the AWS.

Short answer to your question is seed and step are only honored if you disable both parallelism and the COMPUPDATE option in your COPY. Parallelism is disabled if and only if you're loading your data from a single file, which is what we normally do not recommend, and hence will be an unlikely scenario for most users.

Parallelism impacts things because in order to ensure that there is no single point of contention in assigning identity values to rows, there end up being gaps in the value assignment. When parallelism is disabled, the load is happening serially, and therefore, there is no issue with assigning different id values in parallel.

The reason COMPUPDATE impacts things is when it's enabled, the COPY is actually making 2 passes over your data. During the first pass, it internally increments the identity values, and as a result, your initial value starts with a larger value than you'd expect.

We'll update the doc to reflect this.

Also multiple nodes seems to cause such effect with IDENTITY column. In essence it can only provide you with guaranteed unique IDs.