I've just read http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/ article which introduces following code for generating unique, sequential IDs:
CREATE SEQUENCE global_id_sequence;
CREATE OR REPLACE FUNCTION id_generator(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id int := 1;
BEGIN
SELECT nextval('global_id_sequence') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
In the article it states "This means we can generate 1024 IDs, per shard, per millisecond". Note that I am not using shards, just single db. Anyway this code will never work (it won't return unique ids) during bulk insert of more than 1024 rows (because timestamp is same for every insert in single transaction) For example (changed % 1024
to % 3
for this example):
CREATE SEQUENCE global_id_sequence;
CREATE OR REPLACE FUNCTION id_generator(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id int := 1;
BEGIN
SELECT nextval('global_id_sequence') % 3 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
DROP TABLE IF EXISTS id_test;
CREATE TABLE IF NOT EXISTS id_test(id BIGINT DEFAULT id_generator());
INSERT INTO id_test
VALUES (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT);
SELECT * FROM id_test;
results with rows:
1040748500796572672 -- duplicate
1040748500796572673
1040748500796572674
1040748500796572672 -- duplicate
1040748500796572673
1040748500796572674
Is it possible to make this function work for bulk inserts? Is there any other solution?
What I need are unique and sequential values. However I can't use sequence because I want to hide informations about how many such values were generated so far.
I want to use such generated values as primary key in a table. Therefore I want values to be sequential to make it efficient (in terms of index) and make it possible to order by them. It might have gaps as long as consecutive generated values will be sequential. The trick is that I don't want to make it possible for anyone to know how many values were generated in a period of time. If I would use sequence then it would be possible to approximate it by val2 - val1 (there could be gaps though). Also I don't want value like 5 because it already tellls that not many values where generated so far