Plpgsql seems to be deleting and inserting instead of updating - Why?

Question

I am using postgreSQL 7.4.

I have a large table , call it table_a:

key1 INT NOT NULL, 
key2 INT NOT NULL, 
data INT NOT NULL, 
itstamp INT NOT NULL DEFAULT (date_part('EPOCH'::text, (timeofday())::timestamp without time zone))::INTEGER

and a table that summaries the last update time for key1, call it table_b:

key1        INT NOT NULL,
max_itstamp INT NOT NULL

I created a trigger function in plpgsql to update or insert rows in table_b as necessary:

CREATE OR REPLACE FUNCTION table_b_update() RETURNS TRIGGER AS '
 DECLARE
  l_key1 INT;
  l_itstamp INT;
 BEGIN
  l_key1 := new.key1;
  l_itstamp := new.itstamp;
  PERFORM TRUE FROM table_b WHERE key1=l_key1;
  IF NOT FOUND THEN 
   INSERT INTO table_b(key1, max_itstamp) values (l_key1, l_itstamp);
  ELSE
   UPDATE table_b SET max_itstamp=l_itstamp WHERE key1=l_key1;
  END IF;
  RETURN NULL;
 END'
LANGUAGE plpgsql IMMUTABLE;

and then I attached a trigger to table_a:

CREATE TRIGGER table_a_trigger1 AFTER INSERT OR UPDATE ON table_a FOR EACH ROW
EXECUTE PROCEDURE table_b_upate();

Now, the time to insert new data into table_a grows incrementally. The file footprint of table_b grows steadily.

I have used RAISE NOTICE commands in the function to confirm that the If statement causes an UPDATE and not an INSERT after the first call per key.

Since the insert time grows for each INSERT, I tried a VACUUM FULL on table_b. The insert time changed back to the approximate time of the early inserts. The file size for table_b was reduced considerably. After the VACUUM FULL the insert time started to grow again. I don't want to do a VACUUM FULL after every INSERT though.

Is it possible that the UPDATE is actually doing a DELETE and INSERT in table_b?

score 1 · Accepted Answer · answered May 18 '11 at 03:48

Because of its concurrency philosophy, Postgresql seldom does an UPDATE in place, and that only recently. Your antique version does indeed do a DELETE/INSERT pair behind the scenes.

VACUUM and CLUSTER are the approved ways to keep the table size manageable. CLUSTER locks the table (at least, it did in 7.3). You probably want to run an ordinary VACUUM often (multiple times per day) and CLUSTER during an off-hour. These frequencies depend on your update frequency, of course.

My experience with Postgresql has been that upwards migration is easy; dump/restore has worked first time every time.

score 0 · Answer 2 · answered May 17 '11 at 11:26

0

7.4 is extremely old. You really need to upgrade to a recent version, which has good autovacuum, and will handle this automatically.

Do not use VACUUM FULL (try CLUSTER instead).

answered May 17 '11 at 11:26

bobflux

11,123
3
27
27

This is a legacy system, and I am restricted to 7.4 for the time being. – youngthing May 17 '11 at 11:28
1

Postgresql doesn't really DO updates, it does delete/inserts in their place internally. This creates dead rows in the database. Regular old vacuuming can reclaim the lost space for reuse. Schedule regular vacuums so that the space can get reclaimed without vacuum full which is blocking, while regular vacuums are not. Also vacuum full creates index bloat for many access patterns and so then you'll wind up with big indexes and need to reindex. Better to run cluster (if 7.4 supports cluster) than vacuum full. – Scott Marlowe May 17 '11 at 12:23
As far as I understand it a CLUSTER (exists on 7.4) will improve data access speeds for queries. I guess it might improve data insertion (if table has a pkey) and update speeds too. However it doesn't look like it will affect the main problem of data insertion speeds here which is too many dead rows. Can you comment? – youngthing May 17 '11 at 13:26
Also, if CLUSTER is a good option, then is it a one-off action, or something to be performed regularly, maybe in combination with a regular VACUUM? – youngthing May 17 '11 at 13:27
@Scott your comment is very useful - maybe best as a separate answer? – youngthing May 17 '11 at 16:10
cluster has to be re-run every so often to maintain ordering. It's also blocking like vacuum full, i.e. it locks the table while it runs. Since cluster re-writes the table in order, there's no need for it AND vacuum at the same time. However, regular vacuums will reclaim space over time to keep the table from getting bloated. Regular clusters will keep the table in index order. – Scott Marlowe May 18 '11 at 22:45

Plpgsql seems to be deleting and inserting instead of updating - Why?

2 Answers2