2

I've just read http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/ article which introduces following code for generating unique, sequential IDs:

CREATE SEQUENCE global_id_sequence;
CREATE OR REPLACE FUNCTION id_generator(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 1;
BEGIN
    SELECT nextval('global_id_sequence') % 1024 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;

In the article it states "This means we can generate 1024 IDs, per shard, per millisecond". Note that I am not using shards, just single db. Anyway this code will never work (it won't return unique ids) during bulk insert of more than 1024 rows (because timestamp is same for every insert in single transaction) For example (changed % 1024 to % 3 for this example):

CREATE SEQUENCE global_id_sequence;
CREATE OR REPLACE FUNCTION id_generator(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 1;
BEGIN
    SELECT nextval('global_id_sequence') % 3 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;

DROP TABLE IF EXISTS id_test;
CREATE TABLE IF NOT EXISTS id_test(id BIGINT DEFAULT id_generator());

INSERT INTO id_test
VALUES (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT), (DEFAULT);

SELECT * FROM id_test;

results with rows:

1040748500796572672 -- duplicate
1040748500796572673
1040748500796572674
1040748500796572672 -- duplicate
1040748500796572673
1040748500796572674

Is it possible to make this function work for bulk inserts? Is there any other solution?

What I need are unique and sequential values. However I can't use sequence because I want to hide informations about how many such values were generated so far.


I want to use such generated values as primary key in a table. Therefore I want values to be sequential to make it efficient (in terms of index) and make it possible to order by them. It might have gaps as long as consecutive generated values will be sequential. The trick is that I don't want to make it possible for anyone to know how many values were generated in a period of time. If I would use sequence then it would be possible to approximate it by val2 - val1 (there could be gaps though). Also I don't want value like 5 because it already tellls that not many values where generated so far

user606521
  • 14,486
  • 30
  • 113
  • 204
  • A composite (primary) key on {timestamp, serial} would solve the "problem" elegantly. No sweat. – wildplasser Jul 30 '15 at 20:54
  • 1
    The article is a very slick way to demonstrate that the author does not know what he is talking about. The underlying problem he is trying to solve is bad database design and programming skills, but once you get to the stage where shards become a necessity, the same goes for good database engineers. – Patrick Jul 31 '15 at 00:18
  • If you need strictly consecutive values (no gaps) you can't use a sequence anyway, because concurrent transactions will give you interleaved values, rolled back transactions result in gaps, etc. It sounds like what you want might be a gapless sequence with a randomized-but-non-colliding initial offset, right? – Craig Ringer Jul 31 '15 at 00:39
  • I've updated my question – user606521 Jul 31 '15 at 06:49
  • Possible duplicate of [How to replace Django's primary key with a different integer that is unique for that table](http://stackoverflow.com/questions/37558821/how-to-replace-djangos-primary-key-with-a-different-integer-that-is-unique-for) – e4c5 Aug 24 '16 at 01:01

2 Answers2

2

You can use the function as is, clock_timestamp() changes during statement as documented

changing % 1024 to % 3 only proves that you get 3 unique ID's per millisecond.

On my MacBook SSD with % 1024 I can run millions of bulk inserts without duplicates:

insert into id_test (select id_generator() FROM generate_series(0,1000000,1));
Krut
  • 4,112
  • 3
  • 34
  • 42
2

Well, may be this function will be good enough as random and unique ID for the table:

CREATE OR REPLACE FUNCTION next_random_or_free_id_from(table_name TEXT) 
  RETURNS BIGINT AS
$$
DECLARE
    bigint_max BIGINT := ~((1::bigint<<63)::bigint);
    next_id BIGINT := floor(random() * (bigint_max-1) + 1)::bigint;
    id_exists BOOLEAN;
BEGIN
    execute format('select exists(select id from '|| quote_ident(table_name) || ' as T where id = $1);') into id_exists using next_id;
    if id_exists then
        return next_free_id_from(table_name);
    end if;
    return next_id;
END;
$$ language 'plpgsql' STRICT;

To make the first function to work, the next one is also required:

CREATE OR REPLACE FUNCTION next_free_id_from(table_name TEXT) 
  RETURNS BIGINT AS
$$
DECLARE
  next_id BIGINT;
BEGIN
  execute 'select rn from (select id, row_number() over (order by id) as rn from ' || quote_ident(table_name) || ') as T where id > rn limit 1' into next_id;
  if next_id is null then
    execute 'select COALESCE(max(id), 0) + 1 from ' || quote_ident(table_name) into next_id;
  end if;
  return next_id;
END;
$$ language 'plpgsql' STRICT;

Usage:

select next_random_or_free_id_from('test');

Where test is a table name. This function call can be placed in DEFAULT of the primary key.

Konard
  • 2,298
  • 28
  • 21