3

My app reports rainfall and streamflow information to whitewater boaters. Postgres is my data store for the gauge readings that come in 15 minute intervals. Over time, these tables get pretty big and the availabity of range partitioning in Postgres 10 inspired me to leave my shared hosting service and build a server from scratch at Linode. My queries on these large tables became way faster after I partitioned the readings into 2 week chunks. Several months down the road, I checked out the query plan and was very surprised to see that using now() in a query caused PG to scan allof the indexes on my partitioned tables. What the heck?!?! Isn't the point of partitiong data is to avoid situations like this?

Here's my set up: my partitioned table

CREATE TABLE public.precip
(
    gauge_id smallint,
    inches numeric(8, 2),
    reading_time timestamp with time zone
) PARTITION BY RANGE (reading_time)

I've created partitions for every two weeks, so I have about 50 partition tables so far. One of my partitions:

CREATE TABLE public.precip_y2017w48 PARTITION OF public.precip
    FOR VALUES FROM ('2017-12-03 00:00:00-05') TO ('2017-12-17 00:00:00-05');

Then each partition is indexed on gauge_id and reading_time

I have a lot of queries like

WHERE gauge_id = xxx
  AND precip.reading_time > (now() - '01:00:00'::interval)
  AND precip.reading_time < now()

As I mentioned postgres scans all of the indexes on reading_time for every 'child' table rather than only querying the child table that has timestamps in the query range. If I enter literal values (e.g., precip.reading_time > '2018-03-01 01:23:00') instead of now(), it only scans the indexes of appropriate child tables(s). I've done some reading and I understand that now() is volatile and that the planner won't know what the value will be when the query executes. I've also read that query planning is expensive so postgres caches plans. I can understand why PG is programmed to do that. However, one counter argument I read was that a re-planned query is probably way less expensive than a query that end up ignoring partitions. I agree - and that's probably the case in my situation.

As a work arounds, I've created this function:

CREATE OR REPLACE FUNCTION public.hours_ago2(i integer)
RETURNS timestamp with time zone
LANGUAGE 'plpgsql'
COST 100
IMMUTABLE 
ROWS 0
AS $BODY$
    DECLARE X timestamp with time zone;

    BEGIN
            X:= now() + cast(i || ' hours' as interval);
            RETURN X;
    END;
$BODY$;

Note the IMMUTABLE statment. Now when issue queries like

select * from stream where gauge_id = 2142 and reading_time > hours_ago2(-3) and reading_time < hours_ago2(0)

PG only searches the partition table that stores data for that time frame. This is the goal I was shooting for when I set up the partitions in the first place. Booyah. But is this safe? Will the query planner ever cache the results of hours_ago2(-3) and use it over and over again for hours down the road? It's ok if it's cached for a few minutes. Again, my app reports rain and streamflow information; it doesn't deal with financial transactions or any other 'critical' types of data processing. I've tested simple statements like select hours_ago2(-3) and it returns new values every time. So it seems safe. But is it really?

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Debaser
  • 427
  • 1
  • 5
  • 17

2 Answers2

2

That is not safe because at planning time you have no idea if the statement will be executed in the same transaction or not.

If you are in a situation where query plans are cached, this will return wrong results. Query plans are cached for named prepared statements and statements in PL/pgSQL functions, so you could end up with an out-of-date value for the duration of the database session.

For example:

CREATE TABLE times(id integer PRIMARY KEY, d timestamptz NOT NULL);

PREPARE x AS SELECT * FROM times WHERE d > hours_ago2(1);

The function is evaluated at planning time, and the result is a constant in the execution plan (for immutable functions that is fine).

EXPLAIN (COSTS off) EXECUTE x;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on times
   Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)

SELECT pg_sleep(100);

EXPLAIN (COSTS off) EXECUTE x;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on times
   Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)

The second query definitely does not return the result you want.

I think you should evaluate now() (or better an equivalent function on the client side) first, perform your date arithmetic and supply the result as parameter to the query. Inside of PL/pgSQL functions, use dynamic SQL.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Thanks for the response. I think the question here is how long can the returned timestamp be cached? For the length of a PL/pgSQL function? That's not a problem for my app. It'd be OK if a new precip reading came in a few seconds before this query and it wasn't captured. Even a few minutes wouldn't really cause problems in my use case. But 10 or 15 min would start to be a concern. I'll read up on dynamic SQL in PG. And I'll be sure not to use this method where small lapses in time are unacceptable. – Debaser Mar 13 '18 at 03:12
  • I didn't mention that (added it now): The plan is cached for the duration of the database session. – Laurenz Albe Mar 13 '18 at 07:58
2

Change the queries to use 'now'::timestamptz instead of now(). Also, interval math on timestamptz is not immutable.

Change your query to something like:

WHERE gauge_id = xxx
  AND precip.reading_time > ((('now'::timestamptz AT TIME ZONE 'UTC')  - '01:00:00'::interval) AT TIME ZONE 'UTC')
  AND precip.reading_time < 'now'::timestamptz
  • Thanks for your input. Subsequent versions of PG have much better 'partition pruning' abilities, so this 'spoofing' is no longer necessary. Now when I issue a query like precip.reading_time > (now() - '01:00:00'::interval), PG knows which partitions to query and which ones to ignore. – Debaser Mar 04 '21 at 16:02