Adding a Serial Column to existing table with 100,000,000 rows

Question

I have a table with roughly 100,000,000 rows. We need to delete around 80,000 of them for a remediation.

In order to prevent downtime, I have a job setup to grab the records that needs to be deleted and then processes the delete in chunks of 100. However, even processing the first 100 is taking forever.

There is no primary ID on this table and the only way I can reliably reference each row is with a unique column called tx which is a varchar(250)` (though the field is never longer than 18-20 characters). I created an index on this row, but still takes roughly 4-6s to select a row.

Seemed likely the varchar was causing the problem, so I wanted to add a new id bigint serial column, but was trying to figure out whether or not doing this would lock the table until it's able to populate all of the ID's.

I know alter table add column is non blocking as long as there is no default value. But does Serial count as a default value?

I couldn't find an answer to this in the documentation. We're on Postgres 12.

Don't know postgres, but on MySQL, 4-6s having an index seems strange. Verify your index again, and check that you have [allocated] enough memory to hold the index... — Déjà vu, Feb 23 '22 at 03:36
Rather than flitting through random things, you should stick to one thing long enough to figure something out. It should not take 6 seconds to look up a row from an index just because it is varchar. Use `EXPLAIN (ANALYZE, BUFFERS)` to see what is actually happening. — jjanes, Feb 23 '22 at 03:50
Unrelated, but: the use of `serial` is [discouraged](https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_serial) in modern Postgres versions (>= 10) in favour of the standard compliant `identity` columns (but that won't make a difference for performance, just usability) — , Feb 23 '22 at 06:30
If you have a unique index on the `tx` column then a select should not take seconds - only milliseconds. As jjanes mentioned, please **[edit]** your question and add your query and the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you preserve the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. — , Feb 23 '22 at 06:32
4 to 6 seconds for finding a single row using an index, that looks like a problem. But without the query plan, it's hard to help you. But it's unlikely that the index on a varchar is causing problems, we have much bigger tables and sub-millisecond response time on varchar lookups. — Frank Heikens, Feb 23 '22 at 10:03

score 0 · Answer 1 · answered Feb 23 '22 at 07:34

Adding a new column with a sequence-generated value will rewrite the table, which will cause down time. With some care, it could be done without down time, but that is complicated and not worth the effort if you already have a varchar column with a unique index on it that does not contain NULL values.

Searching for rows with the existing index should be a matter of milliseconds. If it isn't, that's the problem you have to solve. Can you add EXPLAIN (ANALYZE, BUFFERS) output for the query to the question?

Adding a Serial Column to existing table with 100,000,000 rows

1 Answers1