Redshift system tables only story a few days of logging data - periodically backing up rows from these tables is a common practice to collect and maintain proper history. To find new rows added in to system logs I need to check against my backup tables either on query (number) or execution time.
According to an answer on How do I keep more than 5 day's worth of query logs? we can simply select all rows with query > (select max(query) from log)
. The answer is unreferenced and assumes that query
is inserted sequentially.
My question in two parts - hoping for references or code-as-proof - is
- are
query
(identifiers) expected to be inserted sequentially, and - are system tables, e.g.
stl_query
, immutable or unchanging?
Assuming that we can't verify or prove both the above, then what's the right strategy to backup the system tables?
I am wary of this because I fully expect long running queries to complete after many other queries have started and completed.
I know query
(identifier) is generated at query submit time, because I can monitor in progress queries. Therefore it is completed expected that a long running query=1
may complete after query=2
. If the stl_query
table is immutable then query=1
will be inserted after query=2
, and the max(query)
logic is flawed.
Alternatively, if query=1
is inserted into stl_query
at run time, then the row must be updated upon completion (with end time, duration, etc). This would required me to do an upsert into the backup table.