3

I have a very large table, 400MM records, that normally only receives inserts. However, recently I have to make a lot of updates to the records to complete a task. This is creating a lot of dead tuples. I have updated the global configs to the following:

autovacuum_vacuum_scale_factor = 0
autovacuum_vacuum_threshold = 10000
autovacuum_vacuum_cost_limit = 2000
autovacuum_max_workers = 6

With these settings, my intent was that anytime dead tuples increases past 10,000 records, the autovacuum would clean it up.

However, I'm finding that when the table is busy with other inserts, etc, the dead tuple count does not change. It stays fixed at a certain dead tuple count. It's only when the db activity slows down at night does the autovacuum appear to work well.

I need the autovacuum to aggressively deal with dead tuples throughout the day. How would I accomplish this? Do I need to increase the max_workers count?

UPDATE: user @Laurenz Albe suggested I run some performance numbers with and without dead tuples to demonstrate the performance difference.

I will provide the sql query and the EXPLAIN(ANALYZE, BUFFERS) results. I changed the name of the table and group key for privacy.

EXPLAIN (ANALYZE, BUFFERS)
SELECT  ld.upid,
        MAX(ld.lid)
INTO _tt_test_with_dead_tuples
FROM big_table ld
GROUP BY ld.upid;

-- >>> With roughly 1% (3.648MM out of 383.2MM) dead tuples, results below.

HashAggregate  (cost=25579746.07..25584552.97 rows=480690 width=8) (actual time=5966760.520..5975279.359 rows=16238417 loops=1)
  Group Key: upid
  Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458
  ->  Seq Scan on big_table ld  (cost=0.00..23642679.05 rows=387413405 width=8) (actual time=0.024..5593239.148 rows=383753513 loops=1)
        Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458
Planning time: 2.677 ms
Execution time: 6012319.846 ms

-- >>> With 0 dead tuples, results below.

HashAggregate  (cost=25558409.48..25562861.52 rows=445204 width=8) (actual time=825662.640..835163.398 rows=16238417 loops=1)
  Group Key: upid
  Buffers: shared hit=15812 read=19753809
  ->  Seq Scan on big_table ld  (cost=0.00..23628813.32 rows=385919232 width=8) (actual time=0.020..533386.128 rows=383753513 loops=1)
        Buffers: shared hit=15812 read=19753809
Planning time: 10.109 ms
Execution time: 843319.731 ms
fcol
  • 169
  • 3
  • 15

1 Answers1

4

The dead tuples are not your problem.

Your real problem is somewhere else; I have highlighted it in the following.

Sequential scan in the slow query:

Buffers: shared hit=3015376 read=16753169 dirtied=1759802 written=1360458

Sequential scan in the fast query:

Buffers: shared hit=15812 read=19753809

It looks like about two million table blocks contain tuples that have been recently written or updated.

At the time a tuple is written, PostgreSQL does not yet know if the transaction will commit or roll back, so this information is not stored in the tuple. It is, however, recorded in the commit log which is stored in pg_xact (or pg_clog, depending on your version).

Now the first reader that comes along and reads a newly written tuple will have to consult the commit log to figure out whether the tuple “exists” or not. To save future readers that hassle, it sets the so-called hint bits in the tuple to reflect that information.

This changes and hence “dirties” the block containing the tuple, and if you have a small shared_buffers setting and run out of available buffers, the backend will even have to write blocks out to storage to clean them and make room.

This writing is what makes your query so slow.

Vacuuming the table makes the problem go away because VACUUM does not only clean up dead tuples, but also sets the hint bits for you (it is a reader too!).

To verify that, run the same SELECT a second time without vacuuming the table, and you will observe that it will be just as fast with the 3 million dead tuples, because now the hint bits are all set.

This is one reason why it can be a good idea to run VACUUM on a table after you have loaded a lot of rows, even if there is nothing to clean up – you save the first reader a lot of work.

Idea: does increasing shared_buffers improve the situation?

But since vacuuming the table solves the problem, you may as well use autovacuum to set the hint bits more often.

For that, you can set autovacuum_vacuum_scale_factor to 0 and set autovacuum_vacuum_threshold to a large constant (way bigger than 10000), so that there are never too many rows without hint bits.

In addition, set autovacuum_vacuum_cost_delay to 0 so that autovacuum finishes fast.

Don't change these parameters globally, use ALTER TABLE ... SET (...) to set them for this table only.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Actually there is a significant drop in performance when there are a few million dead tuples. The default values dont trigger the autovacuum until the performance has already been impacted. I'll update the additional parameter that you suggested to see how that makes a difference. Thank you for your suggestion. – fcol Sep 04 '18 at 19:28
  • Sure, I can run a select query under both conditions. Let me generate this data. If you're saying performance should not be impacted, it's possible my problem is elsewhere. Will report back. – fcol Sep 04 '18 at 20:59
  • Well, performance should be impacted in proportion to the bloat. In a table with 400 million rows, 1 million dead tuples should cause noticeable slowdown. Add `EXPLAIN (ANALYZE, BUFFERS)` output for the slow and the fast query to the question! – Laurenz Albe Sep 04 '18 at 21:02
  • I have updated the question with the requested information. – fcol Sep 05 '18 at 15:57
  • I have updated the answer with the desired solution. – Laurenz Albe Sep 05 '18 at 17:02
  • Thank you for the detailed response. Your response reconfirmed that I need to vacuum that table to avoid slow queries. My original question details the fact that I am vacuuming the table but it's running too slowly and I would like it to run more aggressively as the updates, and hence dead tuples, are happening at a fast pace. – fcol Sep 05 '18 at 17:18
  • The dead tuples are irrelevant. If this is a bulk load, run an explicit `VACUUM` afterwards. If it is just a high change rate, make autovacuum run faster as I indicated. You may use the threshold and set the scale factor to 0, but you definitely have to use a higher threshold that you plan to; at least a million or so. – Laurenz Albe Sep 05 '18 at 17:25
  • I will implement your original suggestion. Thank you. – fcol Sep 05 '18 at 17:27
  • I have revised the answer again to provide a comprehensive discussion and solution. Have you considered increasing `shared_buffers`? – Laurenz Albe Sep 05 '18 at 19:42
  • I have not but I certainly will. I'll make the changes over the week and report back if I notice an improvement. – fcol Sep 05 '18 at 22:27
  • @fcol, did increasing shared_buffers solve the problem for you? – qris Aug 05 '19 at 07:24