0

I have an existing postgresql database with an auto incrementing column, that I want to migrate to Vertica. In Vertica I can create a table with IDENTITY or AUTO INCREMENT constraints (or data type?) but I cannot add these constraint to a column with existing data on it. The answer to a similar question here is a workaround by using a sequence, which is not an option, as duplicate numbers are allowed and a race condition between several nodes could exist which is taken care of by the IDENTITY implementation. Creating the table with an IDENTITY column is also not an option, as inserting is not allowed.

In short I want to have an IDENTITY column on existing data, which is at the same time the primary key.

Dschoni
  • 3,714
  • 6
  • 45
  • 80
  • If you are not inserting into the table, why not just use `row_number()`? – Gordon Linoff Jun 11 '18 at 11:48
  • I want to be able to insert old data into the table. And convert it to identity to be able to have new inserts auto incrementing. Row_number is not identical to unique id from the old table. – Dschoni Jun 11 '18 at 12:25
  • 1
    Have a look at this Database variable `CopyFromVerticaWithIdentity`. You might need to change this to enable load in `IDENTITY` – minatverma Jun 11 '18 at 14:42
  • I don't understand how this should help copying data in from POSTGRESQL? – Dschoni Jun 11 '18 at 16:18

1 Answers1

2

Your problem will be that you need to have the same values for the original IDENTITY columns in your copy in Vertica - while being able to have an AUTOINCREMENT behaviour for any new rows you might now insert into your new table.

This can be achieved by having the Vertica target surrogate key not as IDENTITY, but as an INT NOT NULL, but giving it a DEFAULT from the nextvalmethod of a manually created sequence.

You have this table, and this data (can be from any source database ...), which sports an IDENTITY column.

-- source (could be from anywhere - why not on my own Vertica sandbox)                                                                                      
CREATE TABLE src (
  src_id     IDENTITY NOT NULL
, first_name VARCHAR(32)
, last_name  VARCHAR(32)
, hire_dt    DATE
) UNSEGMENTED ALL NODES
;

INSERT /*+ DIRECT */ INTO src(first_name,last_name,hire_dt)
          SELECT 'Ford','Prefect',DATE '2017-02-05'
UNION ALL SELECT 'Svlad','Cjelli',DATE '2017-02-05'
UNION ALL SELECT 'Cynthia','Fitzmelton',DATE '2017-02-05'
UNION ALL SELECT 'Stavro','Mueller',DATE '2017-02-05'
UNION ALL SELECT 'Veet','Voojagig',DATE '2017-02-05'
UNION ALL SELECT 'Trin','Tragula',DATE '2017-02-05'
UNION ALL SELECT 'Zarniwoop','Zarniwoop',DATE '2017-02-05'
UNION ALL SELECT 'Rob','McKenna',DATE '2017-02-05'
UNION ALL SELECT 'The Lajestic Vantrashell','of Lob',DATE '2017-02-05'
UNION ALL SELECT 'Paul Neil Milne','Johnston',DATE '2017-02-05'
UNION ALL SELECT 'Lunkwill','Lunkwill',DATE '2017-02-05'
UNION ALL SELECT 'Arthur','Dent',DATE '2017-02-05'
UNION ALL SELECT 'Zaphod','Beeblebrox',DATE '2017-02-05'
UNION ALL SELECT 'Tricia','McMillan',DATE '2017-02-05'
UNION ALL SELECT 'Prostetnic Vogon','Jeltz',DATE '2017-02-05'
UNION ALL SELECT 'Lionel','Prosser',DATE '2017-02-05'
UNION ALL SELECT 'Karl','Mueller',DATE '2017-02-05'
UNION ALL SELECT 'Hotblack','Desiato',DATE '2017-02-05'
UNION ALL SELECT 'Gogrilla','Mincefriend',DATE '2017-02-05'
UNION ALL SELECT 'Slartibartfast','Slartibartfast',DATE '2017-02-05'
UNION ALL SELECT 'Roosta','Roosta',DATE '2017-02-05'
UNION ALL SELECT 'Eccentrica','Gallumbitis',DATE '2017-02-05'
UNION ALL SELECT 'Pizpot','Gargravarr',DATE '2017-02-05'
UNION ALL SELECT 'Vroomfondel','Vroomfondel',DATE '2017-02-05'
UNION ALL SELECT 'Majikthise','Majikthise',DATE '2017-02-05'
UNION ALL SELECT 'Gengis Temüjin','Khan',DATE '2017-02-05'
UNION ALL SELECT 'Know-Nothing-Bozo','the Non-Wonder Dog',DATE '2017-02-05'
UNION ALL SELECT 'Lazlaar','Lyricon',DATE '2017-02-05'
UNION ALL SELECT 'Lintilla','Lintilla',DATE '2017-02-05'
UNION ALL SELECT 'Fook','Fook',DATE '2017-02-05'
UNION ALL SELECT 'Gag','Halfrunt',DATE '2017-02-05'
UNION ALL SELECT 'Benji','Mouse',DATE '2017-02-05'
UNION ALL SELECT 'Frankie','Mouse',DATE '2017-02-05'
UNION ALL SELECT 'Grunthos','the Flatulent',DATE '2017-02-05'
UNION ALL SELECT 'Wowbagger','The Infinitely Prolonged',DATE '2017-02-05'
UNION ALL SELECT 'Wonko','The Sane',DATE '2017-02-05'
UNION ALL SELECT 'Reg','Nullify',DATE '2017-02-05'
UNION ALL SELECT 'Fenchurch','of Rickmansworth',DATE '2017-02-05'
UNION ALL SELECT 'Oolon','Colluphid',DATE '2017-02-05'
UNION ALL SELECT 'Humma','Kavula',DATE '2017-02-05'
UNION ALL SELECT 'Judiciary','Pag',DATE '2017-02-05'
UNION ALL SELECT 'Max','Quordlepleen',DATE '2017-02-05'
;

From that source table, you determine the currently highest IDENTITY value issued:

SQL>select max(src_id) from src;
max
 42

In the target Vertica database, you use that value to create a brand new manual sequence ...

CREATE SEQUENCE seq_copy MINVALUE 43;

You create your target table with the surrogate ID depending on either a given input value (which you will supply in an INSERT ... SELECT or in a COPY command) or from the newly created sequence's nextval :

CREATE TABLE cpy (
  cpy_id     INT NOT NULL DEFAULT(seq_copy.nextval)
, first_name VARCHAR(32)
, last_name  VARCHAR(32)
, hire_dt    DATE
) UNSEGMENTED ALL NODES
;

Then, if you specify all columns (or none at all)

In INSERT ...

INSERT INTO cpy (
  cpy_id
, first_name
, last_name
, hire_dt
)
SELECT * FROM src;

.. or COPY ...

COPY cpy (cpy_id,first_name,last_name,hire_dt) 
  FROM LOCAL 'src.csv' DELIMITER ',' ENCLOSED BY '''' ;

... you use the values from the source.

And if you insert a new value, you don't touch the surrogate key ...

INSERT INTO cpy (
  first_name
, last_name
, hire_dt
) VALUES (
  'Thor'
, 'son of Odin'
, '2000-01-01'
;

... and the surrogate key will be populated for you.

Is this what you're after?

Marco

marcothesane
  • 6,192
  • 1
  • 11
  • 21
  • I'm not exactly sure if the sequence works as expected on a multi-node environment where IDENTITY has some cache per node. Do you have information on this? – Dschoni Jun 25 '18 at 12:11
  • 1
    Absolutely.By default, (you can configure that differently), the sequence cache begins at 0 on node 1, at 250,000 on node 2, etc. In every new session, the cache will start at a different new multiple of 250,000 per node. Does this help? – marcothesane Jun 25 '18 at 14:57
  • Yes it does. Thank you. I'm marking this as an answer. – Dschoni Jun 25 '18 at 14:58
  • A follow up question regarding the last insert id for sequences is https://stackoverflow.com/questions/51597088/how-can-i-get-the-last-issued-sequence-id-in-vertica – Dschoni Aug 22 '18 at 10:44