I have a table that is about 150 GB with about 1 billion rows in Postgres. We have a Flask app that INSERTs to this table using SQLAlchemy and we handle database migrations via Alembic. This has been working great for us but we really should just ditch about 90% of the rows.
My first approach was to delete rows via SQLAlchemy; however, according to the RDS plots, the freed disk space was never reclaimed and I believe this is because I have yet to run a vacuum full command on the table, which seems like an expensive process.
It seems like there are many suggestions that mention that the right way to handle such mass deletions is to re-create the table, copy the necessary rows over and then TRUNCATE the old table. Conceptually, this makes sense but I'm not sure what's the right way to do this even after looking at the following:
- This seems to work only for simple tables without indexes and points to #2 for more complicated tables
- This looks like it could work after I simplify it to just one table
- Use an alembic migration to create a new table; copy over the entries from the old table using some quick python script; truncate the old table and then rename the new table.
Does anyone have any advice for the stack I'm currently using? More concretely, I have the following concerns:
- For #1 and #2, does running SQL commands to recreate tables mess up Alembic in any way? I am worried that there is some metadata in Alembic that will not be preserved after running the SQL commands.
- I feel like #3 seems to be the most reasonable approach here. Are there any problems with this approach?