Replace rows from another table

Question

I have two tables with exactly same columns. First one is used for production, web application(django) is retrieving objects from it to show on webpage. And I am using Python script for adding objects to the second one. When script is done I need to replace all rows in table-1 with rows from table-2. Right now I am using something like this:

TRUNCATE table-1;
INSERT INTO table-1 (columns) SELECT columns FROM table-2;
TRUNCATE table-2;
VACUUM FULL;

Problem is that it taking too long time and after TRUNCATE table-1 website is just useless until INSERT is done. What would be the best way to approach this?

Use `delete` instead of `truncate` for table_1. That will allow read access to `table_1` until you commit. — , Feb 03 '16 at 22:11
add `where` statement, assuming insert is too long because the tables are big — cur4so, Feb 03 '16 at 22:12

score 2 · Answer 1 · answered Feb 03 '16 at 22:17

2

You can try to create a view and code django against that. When it's time to cut over between table1 and table2, just create or replace view to switch to the other table instead.

answered Feb 03 '16 at 22:17

ykaganovich

14,736
8
59
96

score 0 · Answer 2 · edited May 23 '17 at 11:54

0

You don't need the expensive VACUUM FULL at all. Right after your big INSERT there is nothing to clean up. No dead tuples, and all indexes are in pristine condition.

It can make sense to run ANALYZE though to update completely changed table statistics right away.

For small tables DELETE can be faster than TRUNCATE, but for medium to large tables TRUNCATE is typically faster.

And do it all in a single transaction. INSERT after TRUNCATE in the same transaction does not have to write to WAL and is much faster:

BEGIN;
TRUNCATE table1;
INSERT INTO table1 TABLE table2;
TRUNCATE table2;  -- do you need this?
ANALYZE table1;  -- might be useful 
COMMIT;

Details:

What causes large INSERT to slow down and disk usage to explode?

If you have any indexes on table1 it pays to drop them first and recreate them after the INSERT for big tables.

If you don't have depending objects, you might also just drop table1 and rename table2, which would be much faster.

How do I replace a table in postgres?

Either method requires an exclusive lock on the table. But you have been using VACUUM FULL previously which takes an exclusive lock as well:

VACUUM FULL rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an exclusive lock on each table while it is being processed.

So it's safe to assume an exclusive lock is ok for you.

edited May 23 '17 at 11:54

Community

1
1

answered Feb 03 '16 at 22:12

Erwin Brandstetter

605,456
145
1,078
1,228

But `truncate` will acquire an access exclusive lock on the table, so that during the transaction, the table can't even be read (and my understanding was, that that is the problem) – Feb 03 '16 at 22:15
@a_horse_with_no_name: I don't think it would make any sense to allow concurrent reads while rewriting the table completely. So this is a *feature*, not a bug. – Erwin Brandstetter Feb 03 '16 at 22:17
I never said it's a bug. But until the whole process of replacing the data is finished, I do think it can make sense to allow clients to see the previous (consistent) state of the table. After all that's what transactions are made for. – Feb 03 '16 at 22:20
@a_horse_with_no_name: transactions "are made" for many purposes. Once I decide to replace the contents of a table I probably do *not* want any more concurrent transactions to read the old contents while I'm at it. – Erwin Brandstetter Feb 03 '16 at 22:22
Depends on the requirement. We actually do have some tables (that store e.g. aggregated data) were we implement precisely this. We even do this with multiple tables in a single transaction (e.g. for a content management system when publishing the content from the staging to the live environment). I don't find this requirement particularly surprising ;) – Feb 03 '16 at 22:24
@a_horse_with_no_name: It's also not in the question. I think my assumption is more likely when replacing the whole table. Even for `DELETE` I would take an exclusive lock on the table first to rule out intermediary states and avoid interlocking with concurrent transactions. I stated that an exclusive lock is necessary. If the OP has different requirements, (s)he has to tell us. – Erwin Brandstetter Feb 03 '16 at 22:39
I need to have read access to the table during transaction. Because it make sense to have access to the previous state of table while updating in my app. So I will try to merge @ykaganovich solution with single transaction method from you to be as fast as possible with it. Both answers are good for me so I will implement it tomorrow and probably post my own answer with both solutions merged. Thanks a lot! :) – Peter Jung Feb 03 '16 at 23:48

Replace rows from another table

2 Answers2