Summary
I am using Postgres UPSERTs in our ETLs and I'm experiencing issues with fragmentation and bloat on the tables I am writing to, which is slowing down all operations including reads.
Context
I have hourly batch ETLs upserting into tables (tables ~ 10s of Millions, upserts ~ 10s of thousands) and we have auto vacuums set to thresholds on AWS.
I have had to run FULL vacuums to get the space back and prevent processes from hanging. This has been exacerbated now as the frequency of one of our ETLs has increased, which populates some core tables which are the source for a number of denormalised views. It seems like what is happening is that tables don't have a chance to be vacuumed before the next ETL run, thus creating a spiral which eventually leads to a complete slow-down.
Question!
Does Upsert fundamentally have a negative impact on fragmentation and if so, what are other people using? I am keen to implement some materialised views and move most of our indexes to the new views while retaining only the PK index on the tables we are writing to, but I'm not confident that this will resolve the issue I'm seeing with bloat.
I've done a bit of reading on the issue but nothing conclusive, for example --> https://www.targeted.org/articles/databases/fragmentation.html
Thanks for your help