6

During the ETL we do the following operations:

    begin transaction;

    drop table if exists target_tmp;
    create table target_tmp like target;

    insert into target_tmp select * from source_a inner join source_b on ...;
    analyze table target_tmp;

    drop table target;
    alter table target_tmp rename to target;

    commit;

The SQL command is performed by AWS Data Pipeline, if this is important.

However, the pipelines sometimes fail with the following error:

    ERROR: table 111566 dropped by concurrent transaction

Redshift supports serializable isolation. Does one of the commands break isolation?

1 Answers1

4

Yes that works, but if generating the temp table takes a while you can expect to see that error for other queries while it runs. You could try generating the temp table in a separate transaction (transaction may not be needed unless you worry about updates to the source tables). Then do a quick rotation of the table names so there is much less time for contention:

-- generate target_tmp first then
begin;
alter table target rename to target_old;
alter table target_tmp rename to target;
commit;
drop table target_old;
systemjack
  • 2,815
  • 17
  • 26
  • 1
    Thaks, I followed your advice and have not seen the error in a while. I am still not sure if the behaviour that I encountered is not a bug. – Kiril Scherbach Apr 17 '17 at 11:38
  • 2
    Since only the current session will reach the target_old table. It's very interesting to know why it matters, if the drop table command is after or before the commit command? – Aviv Noy Aug 22 '17 at 08:23
  • This is great. I had a quicksight view unable to obtain data until the transaction was complete for a scheduled job, this procedure fixes the problem. – Merlin Apr 11 '19 at 08:47
  • 1
    Two things: I did `drop table if exists target_old;` and if generating `target_tmp` fails the alter table rename should fail too, which will abort the transaction, which is the desired result I think. – Merlin Apr 11 '19 at 08:54