4

I am trying to take a backup of a TimescaleDB database, excluding two very big hypertables. That means that while the backup is running, I would not expect to see any COPY command of the underlying chunks, but I actually do!

Let's say TestDB is my database and it has two big hypertables on schema mySchema called hyper1 and hyper2, as well as other normal tables.

I run the following command:

pg_dump -U user -F t TestDB --exclude-table "mySchema.hyper1" --exclude-table "mySchema.hyper2"  > TestDB_Backup.tar

Then I check the running queries (esp. because I did not expect it to take this long) and I find out that several COPY commands are running, for each chunk of the tables I actually excluded.

This is TimescaleDB version 1.7.4.

Did this ever happen to any of you and what is actually going on here?

ps. I am sorry I cannot really provide a repro for this and that this is more of a discussion than an actual programmatic problem, but I still hope someone has seen this before and can show me what am I missing :)

Rigerta
  • 3,959
  • 15
  • 26
  • Even though it says [here](https://docs.timescale.com/latest/using-timescaledb/backup#pg_dump-pg_restore) this can be done, I'm guessing that is not the case. Further guessing it is excluding the parent table, but not the child(chunk) tables. – Adrian Klaver Nov 24 '20 at 21:11
  • Yes, I also read the documentation *many* times and yet here I am. But I think you might be on to something with the idea of it possibly excl. only the parent. Will look into it more closely, thanks for your comment! – Rigerta Nov 24 '20 at 21:16
  • Also what version of `pg_dump` are you using? Is it the one that came with TimeccaleDB? – Adrian Klaver Nov 24 '20 at 21:16
  • I am using pg_dump (PostgreSQL) 11.9 on a Linux machine – Rigerta Nov 24 '20 at 21:18
  • You can use patterns with `--exclude-table/-T` to match multiple tables. See here [Patterns](https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-PATTERNS). – Adrian Klaver Nov 24 '20 at 21:19
  • Forgot that TimescaleDB is an extension on existing Postgres install, so you will be using the Postgres core `pg_dump`. – Adrian Klaver Nov 24 '20 at 21:22
  • Yep! Also changing to using a pattern might bring me to the same place (so basically that only does the specifying of the table names easier for me, but I am afraid the background COPY will run just the same) - I could give it a try though! – Rigerta Nov 24 '20 at 21:22
  • The `COPY` will only run for those tables that `pg_dump` thinks need to be outputted. If a table name matches the pattern it will not be outputted. – Adrian Klaver Nov 24 '20 at 21:29
  • So you mean you think specifying it one by one vs specifying it via a pattern will make the difference? I will give it a try now! – Rigerta Nov 24 '20 at 21:30
  • It just starts with copying the hyper chunks. So weird. Thanks anyway! :) – Rigerta Nov 24 '20 at 21:41
  • No I mean you can exclude multiple tables in one `--exclude-table` if they match the pattern. So assuming `hyper1` has child tables named `hyper1a, hyper1b, etc` the ` `--exclude-table hyper1*` will match them all. – Adrian Klaver Nov 24 '20 at 21:41
  • Is the schema actually `mySchema` in your database? If so then you will need to do `--exclude-table '"mySchema.hyper2"' ` ( single quotes enclosing double quotes) to preserve the case and have it match. – Adrian Klaver Nov 24 '20 at 22:02
  • Yes, that was fine. The issue was what you mentioned in the beginning that even though the hypertables are excluded, the chunks are still dumped. – Rigerta Nov 26 '20 at 14:07

1 Answers1

6

pg_dump dumps each child table separately and independently from their parents, thus when you exclude a hypertable, its chunk tables will be still dumped. Thus you observe all chunk tables are still dumped.

Note that excluding hypertables and chunks will not work to restore the dump correctly into a TimescaleDB instance, since TimescaleDB metadata will not match actual state of the database. TimescaleDB maintains catalog tables with information about hypertables and chunks and they are just another user tables for pg_dump, so it will dump them (which is important), but when they are restored they will contain all hypertables and chunks, which was in the database before the dump.

So you need to exclude data from the tables you want to exclude (not hypertables or chunks themselves), which will reduce dump and restore time. Then it will be necessary to drop the excluded hypertables after the restore. You exclude table data with pg_dump parameter --exclude-table-data. There is an issue in TimescaleDB GitHub repo, which discusses how to exclude hypertable data from a dump. The issue suggests how to generate the exclude string:

SELECT string_agg(format($$--exclude-table-data='%s.%s'$$,coalesce(cc.schema_name,c.schema_name), coalesce(cc.table_name, c.table_name)), ' ')
FROM _timescaledb_catalog.hypertable h 
  INNER JOIN _timescaledb_catalog.chunk c on c.hypertable_id = h.id 
  LEFT JOIN _timescaledb_catalog.chunk cc on c.compressed_chunk_id = cc.id
WHERE h.schema_name = <foo> AND h.table_name = <bar> ;

Alternatively, you can find hypertable_id and exclude data from all chunk tables prefixed with the hypertable id. Find hypertable_id from catalog table _timescaledb_catalog.hypertable:

SELECT id
FROM _timescaledb_catalog.hypertable
WHERE schema_name = 'mySchema' AND table_name = 'hyper1';

Let's say that the id is 2. Then dump the database according the instructions:

pg_dump -U user -Fc -f TestDB_Backup.bak \
  --exclude-table-data='_timescaledb_internal._hyper_2*' TestDB 
k_rus
  • 2,959
  • 1
  • 19
  • 31
  • 1
    Yes, this is the issue. When excluding a hypertable, pg_dump still dumps the chunks! Thanks for writing this answer, I forgot to update that I found the issue. – Rigerta Nov 26 '20 at 14:06