1

Say I'm inserting/updating/deleting a large number of rows (minimum 100k, at most 20M) in batches that must be atomic.

Will pausing every 10-100k operations or so to run ANALYZE within the same transaction actually inform the query planner of anything, since the changes made by this transaction haven't been committed yet?

Logically, imagine a client is doing something like this:

BEGIN;

(for i, record in records)
  INSERT ...
  DELETE ...
  UPDATE ...

  (if i % 10000 == 0)
    ANALYZE;
  (end)
(end loop)

COMMIT;

Will ANALYZE be of any benefit to the query planner for the running transaction, or would it only gather statistics for committed rows?

Brent Dillingham
  • 1,019
  • 1
  • 13
  • 22

1 Answers1

4

The ANALYZE will see changes made earlier in the same transaction. But only that same transaction will use those newly gathered statistics. Other transactions will continue to use the prior stats (until the large transaction commits, then they will pick up the new stats which include the now-committed rows).

jjanes
  • 37,812
  • 5
  • 27
  • 34
  • Thank you for the answer! Would you happen to know if this is mentioned anywhere in the docs? The best I could find was [this mailing list reply](https://www.postgresql.org/message-id/52618284.6090909%40gmx.net) from 2013. – Brent Dillingham Mar 29 '22 at 04:00
  • 1
    This is obvious from the fact that it is transactional, that is, it can be used in a transaction and can be rolled back. – Laurenz Albe Mar 29 '22 at 06:41
  • What wasn't quite so obvious to me was whether ANALYZE collects data from uncommitted rows; that is, that it uses the same visibility as the parent transaction. ANALYZE could collect data from only committed TXIDs and still follow transactional semantics by rolling back those statistics, however clearly this would be a strange implementation, but I wasn't sure if there might be a technical limitation making it so. – Brent Dillingham Mar 29 '22 at 14:57
  • 1
    For future onlookers, I verified this answer by running `ANALYZE VERBOSE` during one of my transactions. The log output shows the expected number of rows on the analyzed tables that would only be visible within the transaction. – Brent Dillingham Mar 29 '22 at 17:27