4

Site A will be generating a set of records. Nightly they will backup their database and ftp it to Site B. Site B will not be modifying those records at all, but will be adding more records and other tables will be creating FK's to Site A's records.

So, essentially, I need to setup a system to take all the incremental changes from Site A's dump (mostly inserts and updates, but some deletes possible) and apply them at Site B.

At this point, we're using Postgres 8.3, but could upgrade if valuable.

I believe I can do this relatively straight-forwardly with Bucardo but I'd love to hear alternatives (or confirmation of Bucardo) before I setup a linux box to test it out.

warren
  • 32,620
  • 21
  • 85
  • 124
Thomas
  • 3,348
  • 4
  • 35
  • 49
  • Not sure about "multimaster", but for "bucardo" I can lend a hand. – Erwin Brandstetter Oct 05 '11 at 22:36
  • Actually, a search for bucardo didn't reveal any other questions that could be tagged "bucardo", so I deleted it again. No use. – Erwin Brandstetter Oct 06 '11 at 16:19
  • Re: close vote - I posted here instead of elsewhere because I found similar questions here, and other stack exchanges had less info on the subject. Could be a chicken/egg problem, I suppose. – Thomas Oct 10 '11 at 04:56

1 Answers1

5

Most every replication solution would do your trick. The Postgres Wiki has a chapter on the topic. But your case is simple enough. I would just use dblink.
This is generalized from a working implementation of mine:

  1. Create a view in the master db that returns updated rows.
    Let's call it myview.

  2. Create one function per table in the slave db that fetches rows via dblink:

CREATE OR REPLACE FUNCTION f_lnk_mytbl()
  RETURNS TABLE(col_a integer, col_b text, col_c text) AS
$func$
   SELECT *
   FROM   public.dblink('SELECT col_a, col_b, col_c FROM myview')
                      AS b(col_a integer, col_b text, col_c text);
$func$  LANGUAGE sql SECURITY DEFINER;

REVOKE ALL ON FUNCTION f_lnk_mytbl() FROM public;
GRANT EXECUTE ON FUNCTION f_lnk_mytbl() TO my_user;
  1. Use above function in another function in the slave db that establishes and closes the server connection.

CREATE OR REPLACE FUNCTION f_mysync()
  RETURNS void AS
$func$
BEGIN
   PERFORM dblink_connect(
          'hostaddr=123.45.67.89 port=5432 dbname=mydb user=postgres password=secret');

   -- Fetch data into local temporary table for fast processing.
   CREATE TEMP TABLE tmp_i ON COMMIT DROP AS
   SELECT * FROM f_lnk_mytbl();

   -- *Or* read local files into temp tables with COPY so you don't need dblink.
   -- UPDATE what's already there (instead of DELETE, to keep integrity).
   UPDATE mytbl m
   SET   (  col_a,   col_b,   col_c) =
         (i.col_a, i.col_b, i.col_c)
   FROM   tmp_i i
   WHERE  m.id = i.id
   AND   (m.col_a, m.col_b, m.col_c) IS DISTINCT FROM
         (i.col_a, i.col_b, i.col_c);

   -- INSERT new rows
   INSERT INTO mytbl
   SELECT * FROM tmp_i i
   WHERE  NOT EXISTS (SELECT 1 FROM mytbl m WHERE m.id = i.id);

   -- DELETE anything? More tables?

   PERFORM dblink_disconnect();
END
$func$  LANGUAGE plpgsql SECURITY DEFINER;

REVOKE ALL ON FUNCTION f_mysync() FROM public;
GRANT EXECUTE ON FUNCTION f_mysync() TO my_user;
  1. Now, this call is all you need. Call as superuser or as my_user. Schedule a cronjob or something.

SELECT f_sync_mytbl();

In PostgreSQL 9.1 or later there is also the new CREATE FOREIGN TABLE. Might be more elegant.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I'm going to keep tweaking, but one problem this has initially related to my issue is the 'where m.id = i.id' section. Probably worth it to add an additional master_id column to the slave. – Thomas Oct 10 '11 at 21:40
  • I've got a few more changes to make to get things happy, but this does what I asked for. I ended up changing the f_lnk_mytbl function to call 'id' 'master_id', and added a 'master_id' to the slave table. Then all id's became master_id in the different queries. Also, 'SELECT i.*' had to get expanded explicitly. – Thomas Oct 10 '11 at 22:53