0

I'm trying to copy a Postgres table1 to another table2 while changing the value of one of the columns. To make the transfer faster I run 10 different processes each having different offsets in the table1 to start from, e.g., 1st process: SELECT * FROM table OFFSET offset1 LIMIT x; then copy to table2, 2nd process: SELECT * FROM table OFFSET offset2 LIMIT x then copy to table2.

But even though I don't have duplicate rows in my table1 I do get duplicate rows in my table2 (x is smaller than offset2-offset1). Is it possible that the same offset value is not pointing to the same row in the table across different processes? If yes, what would be better way to copy a table while modifying a column in Postgres? Thanks!

hovnatan
  • 1,331
  • 10
  • 23
  • Near duplicate: [PostgreSQL - repeating rows from LIMIT OFFSET](http://stackoverflow.com/questions/13580826). The asker of that other question uses `order by`, but on a non-unique expression. BTW this is missing in Denis answer. Just using `ORDER BY` is not sufficient, it needs to be applied to a field or combination of fields or an expression that is unique across all rows. – Daniel Vérité Dec 18 '14 at 15:03
  • Also, you should time `CREATE TABLE table2 AS SELECT...`, it's probably faster than the 10 different processes if these sessions are playing against each other to gain access to the disks, unless your postgres data partition is spread over a large number of physical drives – Daniel Vérité Dec 18 '14 at 15:07

1 Answers1

2

Without an order by statement, limit and offset are seldom meaningful... SQL offers no guarantee on row order unless you make it explicit. So add an order by clause.

Also, if copying a table wholesale is what you want, it's better to simply:

insert into table2 select * from table1
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154