2

My idea is to implement a basic «vector clock», where a timestamps are clock-based, always go forward and are guaranteed to be unique.

For example, in a simple table:

CREATE TABLE IF NOT EXISTS timestamps (
    last_modified TIMESTAMP UNIQUE
);

I use a trigger to set the timestamp value before insertion. It basically just goes into the future when two inserts arrive at the same time:

CREATE OR REPLACE FUNCTION bump_timestamp()
RETURNS trigger AS $$
DECLARE
    previous TIMESTAMP;
    current TIMESTAMP;
BEGIN
     previous := NULL;
     SELECT last_modified INTO previous
      FROM timestamps
     ORDER BY last_modified DESC LIMIT 1;

     current := clock_timestamp();
     IF previous IS NOT NULL AND previous >= current THEN
        current := previous + INTERVAL '1 milliseconds';
     END IF;
     NEW.last_modified := current;
     RETURN NEW;
END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS tgr_timestamps_last_modified ON timestamps;

CREATE TRIGGER tgr_timestamps_last_modified
BEFORE INSERT OR UPDATE ON timestamps
FOR EACH ROW EXECUTE PROCEDURE bump_timestamp();

I then run a massive amount of insertions in two separate clients:

DO
$$
BEGIN
    FOR i IN 1..100000 LOOP
       INSERT INTO timestamps DEFAULT VALUES;
    END LOOP;
END;
$$;

As expected, I get collisions:

ERROR: duplicate key value violates unique constraint "timestamps_last_modified_key"
État SQL :23505
Détail :Key (last_modified)=(2016-01-15 18:35:22.550367) already exists.
Contexte : SQL statement "INSERT INTO timestamps DEFAULT VALUES"
PL/pgSQL function inline_code_block line 4 at SQL statement

@rach suggested to mix current_clock() with a SEQUENCE object, but it would probably imply getting rid of the TIMESTAMP type. Even though I can't really figure out how it'd solve the isolation problem...

Is there a common pattern to avoid this?

Thank you for your insights :)

leplatrem
  • 1,005
  • 13
  • 25
  • 2
    What's wrong with just a sequence? Do you actually need the time? What about using a key on 2 columns (timestamp, sequence)? Otherwise you have V1 UUIDs. – jcaron Jan 15 '16 at 18:53
  • 1
    Why dont you just use `now()`? `INSERT INTO timestamps now();` or set that field's default value as `now()` in the end you can't insert duplicate now() values as it changes with every transaction. – Solrac Jan 15 '16 at 18:54
  • @SolracRagnarockradio, any multiple inserts within the same transaction would all get the same timestamp. Also not sure that the micro-second accuracy of timestamps would guarantee different values. – jcaron Jan 15 '16 at 18:55
  • 1
    Just catch the exception at the client and retry. It really is that simple. – Clodoaldo Neto Jan 15 '16 at 18:57
  • I want it time-based because it is used for synchronization. But I could probably use two fields indeed, and use the sequence as fake decimals, like @rachbelaid suggested – leplatrem Jan 15 '16 at 19:00
  • You mean a client is not generating the inserts? – Clodoaldo Neto Jan 15 '16 at 19:03
  • 1
    What do you synchronise things with? Why would a timestamp be any more useful than a sequence for this purpose? – jcaron Jan 15 '16 at 21:27
  • 2
    Besides what @jcaron said, the clock on any computer drifts and when computer synchronizes its clock, the OS time jumps. The clock may drift in any direction, so it is possible to get timestamps from future or past, i.e. out of order. I think, sequence is the way to guarantee uniqueness, add timestamp to it if you need it, but don't assume it is unique. – Vladimir Baranov Jan 17 '16 at 00:23

2 Answers2

2

My two cents (Inspired from http://tapoueh.org/blog/2013/03/15-batch-update).

try adding the following before massive amount of insertions:

LOCK TABLE timestamps IN SHARE MODE;

Official documentation is here: http://www.postgresql.org/docs/current/static/sql-lock.html

Quinn
  • 4,394
  • 2
  • 21
  • 19
1

If you have only one Postgres server as you said, I think that using timestamp + sequence can solve the problem because sequence are non transactional and respect the insert order. If you have db shard then it will be much more complex but maybe the distributed sequence of 2ndquadrant in BDR could help but I don't think that ordinality will be respected. I added some code below if you have setup to test it.

CREATE SEQUENCE "timestamps_seq";

-- Let's test first, how to generate id.
SELECT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0') as unique_id ;

           unique_id
--------------------------------
 145288519200000000000000000010
(1 row)


CREATE TABLE IF NOT EXISTS timestamps (
    unique_id TEXT UNIQUE NOT NULL DEFAULT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0')
);


INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;

select * from timestamps;
           unique_id
--------------------------------
 145288556900000000000000000001
 145288557000000000000000000002
 145288557100000000000000000003
(3 rows)

Let me know if that works. I'm not a DBA so maybe it will be good to ask on dba.stackexchange.com too about the potential side effect.

Rachid
  • 2,463
  • 1
  • 21
  • 9
  • Could also have 2 columns DATETIME and BIGSERIAL with an index on both so you can order then quickly.. I'm assuming that why you wanted them time based. – Rachid Jan 15 '16 at 19:34