0

All the question is in the title,

if we kill a cluster query on a 100 millions row table, will it be dangerous for database ?

the query is running for 2 hours now, and i need to access the table tomorrow morning (12h left hopefully).

I thought it would be far quicker, my database is running on raid ssd and Bi-Xeon Processor.

Thanks for your wise advice.

Sid

Sid
  • 331
  • 2
  • 10

2 Answers2

2

No, you can kill the cluster operation without any risk. Before the operation is done, nothing has changed to the original table- and indexfiles. From the manual:

When an index scan is used, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes.

When a sequential scan and sort is used, a temporary sort file is also created, so that the peak temporary space requirement is as much as double the table size, plus the index sizes.

Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
  • Thanks a lot, any idea of time windows before end of command on a 100 millions row table ? few hours ? few day ? few mouths ? :) – Sid Apr 24 '12 at 16:31
  • It depends on the size of the table and indexes and how fast your disk system is. And random IO is slow, that will be the limiting factor. When running Linux, check iostat to see how fast things are going. – Frank Heikens Apr 24 '12 at 17:39
0

As @Frank points out, it is perfectly fine to do so.

Assuming you want to run this query in the future and assuming you have the luxury of a service window and can afford some downtime, I'd tweak some settings to boost the performance a bit.

In your configuration:

  1. turn off fsync, for higher throughput to the file system

Fsync stands for file system sync. With fsync on, the database waits for the file system to commit on every page flush.

  1. maximize your maintenance_work_mem

It's ok to just take all memory available, as it will not be allocated during production hours. I don't know how big your table and the index you are working on are, things will run faster when they can be fully loaded in main memory.

Rens Verhage
  • 5,688
  • 4
  • 33
  • 51
  • Thanks you too. But did you think we talk about hours ? days ? – Sid Apr 24 '12 at 16:41
  • You can't turn fsync off when cluster is already running. You also might end up with a corrupt database cluster when things go wrong. – Frank Heikens Apr 24 '12 at 17:41
  • There is a lot of debate on the use of fsync. Yes it could harm your data when things go wrong. But what about the filesystem cache? What happens when the filesystem crashes? @Sid I don't know if it takes hours, could be, hard to tell / guess. – Rens Verhage Apr 24 '12 at 17:48