4

Background: I am migrating from postgreSQL to Vertica and found, that there are some issues in IDENTITY or AUTO_INCREMENT columns. One of these issues is, that vertica cannot assign values to IDENTITY columns or alter a column, that already has data into an IDENTITY column. Therefore I created a sequence and set the default value of the column to be unique doing:

SELECT MAX(id_column) FROM MY_SCHEMA.my_table; 

which is 12345

CREATE SEQUENCE MY_SCHEMA.seq_id_column MINVALUE 12346 CACHE 1; 

ALTER TABLE MY_SCHEMA.my_table 
ALTER COLUMN id_column SET DEFAULT(MY_SCHEMA.seq_id_column.nextval);

ALTER TABLE MY_SCHEMA.log ADD UNIQUE(id_column);

Which works as expected. In this case, I have the cache deactivated, as I am on a single node installation and I want my ID column to be contiguous. However, this is not an option on a cluster installation as the needed lock leads to a bottleneck.

Question: In a vertica cluster with several nodes, how can I access the ID of the last insert in a session (without an additional select)?

E.g. in postgreSQL I could do something like

INSERT INTO MY_SCHEMA.my_table RETURNING id_column;

which does not work in Vertica. Furthermore, the LAST_INSERT_ID() function of Vertica does not work for named sequences. I also feel, that querying the current_value of MY_SCHEMA.seq_id_column could be giving wrong results due to caching, but I am unsure about this.

Why no additional SELECT?

To my knowledge, the select will only give correct values after a commit. I cannot do a commit after every single insert due to performance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dschoni
  • 3,714
  • 6
  • 45
  • 80
  • Did you check system table [SEQUENCES](https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SQLReferenceManual/SystemTables/CATALOG/SEQUENCES.htm)? Also, can you, please, describe what you try to achieve by getting last insert id? – Dmitry Shaldin Jul 31 '18 at 04:24
  • As mentioned, SEQUENCES don't support LAST_INSERT_ID and I fear hitting a race condition when querrying the SEQUENCES table. I simply want to have the primary key of the last inserted entry in a session. – Dschoni Aug 01 '18 at 10:06
  • 1
    I wonder if CURRVAL would work for this? F.e. `CURRVAL(MY_SCHEMA.seq_id_column)`. [Reference](https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/CURRVAL.htm%3FTocPath%3DSQL%2520Reference%2520Manual%7CSQL%2520Functions%7CSequence%2520Functions%7C_____2). Ah never mind, you worry about the effect of caching. *walks away* – LukStorms Aug 19 '18 at 20:14
  • @LukStorms Actually, it does. I tried this with two concurrent sessions, and each session gives me the correct (cached) value. If you add your comment as an answer, you will get the bounty. – Dschoni Aug 20 '18 at 10:00
  • Thanks, but although it's tempting, I'll pass on that. It wouldn't feel right. All I did was lookup some documentation, based on previous experience with other distributed DBMS's. I don't have access to a Vertica. And I don't like posting an answer that I can't test myself, or don't know it will work by heart. Since you've done the grunt work by testing it, and noone else put up a decent answer, then those point might as well be refunded to you. Maybe you could even post your own answer. Since the issue itself might be usefull for others who someday face the same issue. – LukStorms Aug 20 '18 at 10:42

1 Answers1

2

The comments from LukStorms pointed me in the right direction.

The NEXTVAL() function (as far as I have tested) gives contiguous values in the case, where one single session queries them. Furthermore, on concurrent access, if issued after an insert, CURRVAL retrieves the cached value, which is guaranteed to be unique but not necessarily contiguous. As I never call NEXTVAL anywhere else as in my default clause, this solves the problem for me, although there might be cases, where an additional call to NEXTVAL between inserts increments the sequence counter.

One case I can think of (and that I will test in the future) is what happens if AUTO COMMIT is set to OFF, which is ON by default for the vertica client drivers.

UPDATE:

This even seems to work with AUTOCOMMIT being OFF (shown using the vertica-python client driver, where C is the connection and cur the cursor):

cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 1
cur.execute("SET SESSION AUTOCOMMIT TO OFF")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT NEXTVAL('my_schema.my_sequence');")
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4

However, this seems to be unchanged during a rollback of the connection. So the following happens:

C.rollback()
cur.execute("SELECT CURRVAL('my_schema.my_sequence');")
cur.fetchall()
--> 4
Dschoni
  • 3,714
  • 6
  • 45
  • 80
  • Interesting. I kinda expected something like that, that the sequence value wouldn't be rolled back. And I'm guessing that your aiming for a continues number. But maybe you could keep the value before the transaction. Or get the MAX value from the table. And ALTER the sequence with a RESTART after a rollback? Reference [here](https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/ALTERSEQUENCE.htm) – LukStorms Aug 20 '18 at 15:40
  • As I understand so far, continuous numbers are leading to a bottleneck due to global locks. I am willing to sacrifice continuousity in favor of performance. – Dschoni Aug 20 '18 at 15:53