0

I tried to use pglogical to replicate data between 2 PG servers and found huge disk occupation during the process. Here is the details

Env:

  • Source PG:

    • server name: pg-publisher.
    • IP address: 10.0.0.1
    • Whole disk space: 3.5T
    • Size of PG data folder: 2.2T
    • Size of biggest table:
      • table name -> my_schema.logs.
      • pg_table_size('logs') = 1.4T
    • PG is running, new data keeps coming in and inserting into the tables including logs table.
  • Destination PG:

    • server name: pg-subscriber
    • IP address: 10.0.0.2
    • Whole disk space: 3T

How did I setup the replication:

  • On pg-publisher

    • Create node:

      SELECT pglogical.create_node(
      node_name := 'publisher',
      dsn := 'host=10.0.0.1 port=5432 dbname=mydb user=pguser password=pgpassword'
      );
      
    • Create replication_set

      SELECT pglogical.create_replication_set('my_replication_set');
      SELECT pglogical.replication_set_add_all_tables('my_replication_set', ARRAY['my_schema']);
      
  • On pg-subscriber

    • Create node

      SELECT pglogical.create_node(
      node_name := 'subscriber',
      dsn := 'host=10.0.0.2 port=5432 dbname=mydb user=pguser password=pgpassword'
      );
      
    • create subscription

      SELECT pglogical.create_subscription(
      subscription_name := 'pg_subscriber',
      provider_dsn := 'host=10.0.0.1 port=5432 dbname=mydb user=pguser password=pgpassword',
      replication_sets := ARRAY['my_replication_set'],
      synchronize_structure := false,
      synchronize_data := true
      

); ```

Verification

After above action. I checked the status on both servers:

  • on pg-subscriber. The pglogical replication is in initializing state (from pglogical.show_subscription_status())
  • on pg-publisher. The replication slot is created and does not active. That's expected because there are lots of data need to be initialized. It will not be active until it's reach to replicating state

My confusion

All the actions seems work as expected. But after 4 days, I got alert and found the available disk space on pg-publisher decreased a lot (decrease over 20%). Then here is what I found:

  • pglogical replication is still in initializing state on pg-subscriber. On pg-publisher, I checked the pg_stat_activity on it and found pglogical event is using COPY command to copy the biggest table my_schema.logs.

  • Folder size of PG data on pg-publisher increased from 2.2T to 2.7T

I know the publisher server will retain the WAL files required for the replication before it's reach to replicating state, that will occupy the disk space. But I checked the pg_wal folder size on pg-publisher and found it's only less than 20GB. But the base folder increased to 2.7T. May I know why does it happen. Does COPY command occupy the disk space(I don't think so because I found it will COPY to stdout) or is there another thing will cause it?

Binary
  • 81
  • 8
  • Sorry @jjanes. I did call to pglogical.create_subscription but forgot wrote it. Already added. Yes, data is coming and insert into DB, but not too much. On the other hand. Because the disk space will be used up so I cancel the replication and found that the `base` folder return back to 2.2T. So the increased size should related to the replication. So what I want to know is why logical replication could cause such data increasing in `base` folder? – Binary Jan 18 '22 at 01:54
  • I can't reproduce this as described. It must be an interaction between the replication and something else going on on the server at the same time. Can you just go look in `base` and see where the space is going? Does your database have intense churn (either UPDATEs, or many INSERT and DELETE to the same tables)? The long running COPY might case bloat if so. – jjanes Jan 18 '22 at 18:34
  • yes @jjanes, my database has intense churn during the process (most of them are INSERTs). May I know why long running COPY might cause bloat in this scenario? – Binary Jan 19 '22 at 01:57
  • In order to have churn, you need old tuples whose space could potentially be reused. So with mostly just INSERTs, you wouldn't have churn, just regular growth. Anyway, the long-running transaction in which the COPY runs is entitled to look at old tuples, so the old tuples must be preserved, and their space can't be reused. Now the COPY might not be interested in them because they are in a different table, but PostgreSQL transaction accounting is not sophisticated enough to know that. – jjanes Jan 19 '22 at 17:23

0 Answers0