What is the effect on record size of reordering columns in PostgreSQL?

Question

Since Postgres can only add columns at the end of tables, I end up re-ordering by adding new columns at the end of the table, setting them equal to existing columns, and then dropping the original columns.

So, what does PostgreSQL do with the memory that's freed by dropped columns? Does it automatically re-use the memory, so a single record consumes the same amount of space as it did before? But that would require a re-write of the whole table, so to avoid that, does it just keep a bunch of blank space around in each record?

score 2 · Answer 1 · edited Apr 13 '17 at 12:42

_{The question is old, but since both answers are wrong or misleading, I'll add another one.}

When updating a row, Postgres writes a new row version and the old one is eventually removed by VACUUM after no running transaction can see it any more.

Plain VACUUM does not return disk space from the physical file that contains the table to the system, unless it finds completely dead or empty blocks at the physical end of the table. You need to run VACUUM FULL or CLUSTER to aggressively compact the table and return excess space to the system. This is not typically desirable in normal operation. Postgres can re-use dead tuples to keep new row versions on the same data page, which benefits performance.

In your case, since you update every row, the size of the table is doubled (from its minimum size). It's advisable to run VACUUM FULL or CLUSTER to return the bloat to the system.
Both take an exclusive lock on the table. If that interferes with concurrent access, consider pg_repack, which can do the same without exclusive locks.

To clarify: Running CLUSTER reclaims the space completely. No VACUUM FULL is needed after CLUSTER (and vice versa).

More details:

PostgreSQL 9.2: vacuum returning disk space to operating system

Quassnoi · Accepted Answer · 2010-04-10T18:22:33.350

1

From the docs:

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

You'll need to do a CLUSTER followed by a VACUUM FULL to reclaim the space.

edited Apr 10 '10 at 18:22

answered Apr 08 '10 at 16:25

Quassnoi

413,100
91
616
614

2

VACUUM FULL will only reclaim the space if every row is updated, it cleans up after completely dead rows--not dead columns in otherwise good rows. Since the UPDATE that sets the new column must happen before doing the DROP, the space taken up by the now obsolete column won't be reclaimed by it. Only the old, original copy of the row will be cleaned up. Besides a total UPDATE, you could execute CLUSTER (in PostgreSQL 8.3 or later) or create a whole new copy of the table (something like CREATE TABLE AS) and shuffle the names around to actually get rid of the space taken up by dead columns. – Greg Smith Apr 10 '10 at 00:46
@Summer: `CLUSTER` rewrites the whole table (plus indexes) thereby optimizing it perfectly. `VACUUM FULL` is redundant after `CLUSTER`. You might want to run `ANALYZE`. This answer is incorrect (except for the quote). I added an answer to clarify. – Erwin Brandstetter Apr 10 '15 at 18:29

score 1 · Answer 3 · answered Apr 08 '10 at 16:26

1

Why do you "reorder" ? There is no order in SQL, it doesn't make sence. If you need a fixed order, tell your queries what order you need or use a view, that's what views are made for.

Diskspace will be used again after vacuum, auto_vacuum will do the job. Unless you disabled this process.

Your current approach will kill overall performance (table locks), indexes have to be recreated, statistics go down the toilet, etc. etc. And in the end, you end up with the same situation you allready had. So why the effort?

answered Apr 08 '10 at 16:26

Frank Heikens

117,544
24
142
135

I was expecting this to come up. I like my columns to be in an order that makes sense to humans, like me. As long as it doesn't take up too much disk space (which it won't after a vacuum full). Just personal preference. :) – Summer Apr 08 '10 at 17:19
2

You shouldn't be relying on implicit column ordering, just like you shouldn't rely on implicit row ordering in queries without ORDER BY. Does this mean your applications use SELECT * and INSERT INTO table VALUES, without field names? Because that would be just asking for trouble. If you want a specific column order, create a view with that order. – MkV Apr 09 '10 at 08:55
I need reordering because it's just easier to group certain columns ! I can't believe something basic like this is not supported in Postgres.... OF COURSE I don't rely on the order of columns in the table, but it's just easier when designing the database ! Why don't all these linux nerds understand this?! People use GUI's nowadays ! It's 2011 ! – Dylan Feb 19 '11 at 12:48
1

This answer is incorrect on several points. Physical order of columns has a number of implications. Not only on SELECT * .. There are side effects on physical disc space (type alignment) and performance - even if unimportant in most cases. The postgres team has the feature to reorder columns on their todo list, they acknowledge the usefulness. OP indicates that this is a rare operation, concerns about performance are misplaced. Just because there is a lot of *wrong* reasons to reorder columns, does not say there are no good ones. To claim that there would be "no order in SQL" is plain wrong. – Erwin Brandstetter Sep 19 '11 at 20:43

What is the effect on record size of reordering columns in PostgreSQL?

3 Answers3