0

I am trying to write my table to a PostgreSQL database then declare a primary key. My data is really huge (includes billions of rows and total size is approximately 150 GB), when I try to create a primary key after writing the table, it takes forever. Here is how I define the primary key;

ALTER TABLE my_huge_table ADD CONSTRAINT huge_pk PRIMARY KEY (column_x,column_y)

I am %101 sure about the uniqueness of these columns and they are not null. Waiting for nothing is really unnecessary, I am seeking to find an escape route against wasting time. I surely think there are ones who faced a situation like this. Waiting for your precious help.

Samet Sökel
  • 2,515
  • 6
  • 21
  • 4
    Checkout https://dba.stackexchange.com/questions/147255/what-is-the-best-way-to-create-primary-key-on-a-large-table-in-postgresql-9-5 – Simon Martinelli Mar 14 '21 at 15:20
  • @SimonMartinelli Thanks. just to make sure, if we would compare creating a unique index then using it as a primary key and creating a primary key without defining a unique key, which one would take less? from the topic I understand that it doesn't lock operations on the table but my main aim is to reduce the time of creating the primary key. if it takes less than my way, I can gladly apply this way. – Samet Sökel Mar 14 '21 at 15:34
  • You should check EXPLAIN and you should check what the system is busy with. It is probably busy with sequential read and then sorting. You should have enough temporary space for sorting and you should see on your OS that CPU and IO are about equally loaded. You do not want to run on 99% CPU while IO is essentially zero, nor do you want to have IO 100% exhausted while the CPU is almost idle (process in IO-wait state the entire time.) – Gunther Schadow Mar 14 '21 at 15:37
  • Since you are mostly sorting, you will probably be IO bound on the temporary space, and there may be little you can do if you have your physical RAM fully loaded. Of course, make sure you are using all your physical RAM and that you have enough WORK MEM allocated to your backend processes. – Gunther Schadow Mar 14 '21 at 15:39

1 Answers1

1

You are not waiting for nothing. You are probably waiting for an index to be built. That index will be needed to enforce the primary key in the future. If the system trusted your declaration that it was already unique, that really wouldn't get you much, it would still need to build the index. If you have faith the primary key is not violated now, and also have faith no one will try to violate it in the future, then don't add the primary key. Just add a comment saying you know this is a primary key, but for performance reasons won't formally declare it.

jjanes
  • 37,812
  • 5
  • 27
  • 34