0

I need to periodically copy data from the TMP database to the remote PROD database with some data modifications in columns. When I use the postgres_fdw extension from the PROD database (with mapping foreign schema), the process of copying a million records lasts 6 minutes.

insert into prod.foreign_schema.foreign_table 
(select * from tmp.public.table limit 1000000);

However, when I use the dblink to copy the same table from the PROD database (SQL is running on the PROD database, not on the TEMP), the process lasts 20 seconds.

insert into prod.public.table 
(select * from dblink('host=192.1... port=5432 dbname=... user=… password=…. connect_timeout=2', 'select * from tmp.production.table limit 1000000') as tab (id integer…..)
);

How can I optimize and shorten the process of copying data from the TEMP database?

I have to run SQL commands on the TMP database. TMP and PROD database are in this same versions (10).

Fral
  • 1
  • Neither one of these will run. postgres_fdw allows you map foreign tables where the mapping appears locally, but it does not let you just directly reference them with the foreign db's name. – jjanes Apr 22 '20 at 11:44
  • Note: `LIMIT` without `ORDER` makes no sense. – wildplasser Apr 22 '20 at 12:15
  • This is only an example, important is difference between performance. – Fral Apr 22 '20 at 12:21

1 Answers1

2

The first statement will effectively run many small inserts, albeit with a prepared statement, so you don't have the planning overhead each time. So you'll have more round trips between the two servers, which probably is the reason for the difference.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Indeed, even if you write it as `\copy foreign_schema.table from ...` it translates it into individual inserts. Do you know if anyone is working on improving this? – jjanes Apr 22 '20 at 12:21
  • No, I don't know. – Fral Apr 22 '20 at 12:27
  • @jjanes I have not heard of any activities in that direction. Personally, I think that the FDW interface is not the best one for bulk DML. But if I remember right, postgres_fdw optimizes some other bulk DML operations. – Laurenz Albe Apr 22 '20 at 13:35